slackapi / bolt-python

A framework to build Slack apps using Python
https://tools.slack.dev/bolt-python/
MIT License
1.07k stars 247 forks source link

Receiving Invalid request signature detected #1158

Closed manish001in closed 1 month ago

manish001in commented 2 months ago

Hey team, So, I am receiving the Invalid request signature detected. In last few days, it has become very common! I have made a change to my system where the event that I get is responded with ack() and then I use FastAPIs background tasks to run the business logic and processing for the message event.

Weirdly this does not happen for each event, only few events! In the background tasks, I use the event_context to call apps.event.authorizations.list api endpoint.

I am not sure why this issue is emerging. Does the event_context also expire?

Reproducible in:

The slack_bolt version

slack_bolt==1.20.1 slack_sdk==3.32.0

Python runtime version

Python 3.11.6

OS info

Amazon Linux release 2023.5.20240903 (Amazon Linux)

1 SMP Tue May 21 16:54:42 UTC 2024

(Paste the output of sw_vers && uname -v on macOS/Linux or ver on Windows OS)

Steps to reproduce:

Expected result:

I should not get intermittent issues with Invalid Request signature.

Actual result:

This is one log:

slack_bolt.RequestVerification INFO Invalid request signature detected (signature: v0=7701e8a900b2793475c41e3c65804eaa72c3d0dfbd1445335339adaac9c2e9ba, timestamp: 1725954559, body: {"token":"6J0cNanPs4p2KjLVavCwXYBk","team_id":"<>","enterprise_id":"<>","context_team_id":"<>","context_enterprise_id":"<>","api_app_id":"<>","event":{"subtype":"bot_message","text":"Alert: Content is Passed for whatsapp request, external","type":"message","ts":"1725954555.694759","bot_id":"<>","blocks":[{"type":"rich_text","block_id":"N0ar","elements":[{"type":"rich_text_section","elements":[{"type":"text","text":"Alert: Content is Passed for whatsapp request, external "}]}]}],"channel":"<>","event_ts":"1725954555.694759","channel_type":"channel"},"type":"event_callback","event_id":"Ev07L9HBMQ7R","event_time":1725954555,"authorizations":[{"enterprise_id":"<>","team_id":"<>","user_id":"<>","is_bot":false,"is_enterprise_install":false}],"is_ext_shared_channel":false,"event_context":"4-eyJldCI6Im1lc3NhZ2UiLCJ0aWQiOiJUMTkzVFNaRkUiLCJhaWQiOiJBMDU1ME01N0NTWCIsImNpZCI6IkMwN0MwTkVSVzZSIn0"})

Requirements

Please read the Contributing guidelines and Code of Conduct before creating this issue or pull request. By submitting, you are agreeing to those rules.

seratch commented 2 months ago

Hi @manish001in, thanks for writing in. One possible cause might be that your web app modifies the request body before passing it to bolt-python's request verifier. The signature verification requires raw request body data. If the body data is parsed beforehand and then converted to a string again, the string data is no longer the same. This is a common pitfall with Spring Boot (Java) and your situation sounds similar. If your FastAPI web app has filters or middleware that consume the request body data before passing it to bolt’s adapter, you need to avoid using the FastAPI extension along with bolt-python to ensure compatibility with the verification requirements.

seratch commented 2 months ago

In the background tasks, I use the event_context to call apps.event.authorizations.list api endpoint. I am not sure why this issue is emerging. Does the event_context also expire?

No, it doesn't. Also, this API call is unrelated to the signature verification failure.

manish001in commented 2 months ago

Hey @seratch , To my knowledge I dont think there is any filter or middleware making any changes to the raw body. I believe that to be true because most events that I receive do work fine and the system is able to handle it with the same logic and code.

I have been facing some issues regarding too much CPU usage in my server which I believe could be because of the extra load of requests I may be receiving which starts causing this issue along with receiving some internal_error responses from Slack. Hence, my confusion.

seratch commented 2 months ago

The request verification logic rejects too old x-slack-request-timestamp (older than 5 minutes ago). Does timestamp: 1725954559 (2024-09-10T07:49:19 UTC) in your log align with the timestamp of loggging? This is the only other cause of the rejection.

With the information given so far, I am unable to guess anything further. Whenever you find anything related to the bolt-python side uncertain, please feel free to ask them.

manish001in commented 2 months ago

Yes, that seems likely to be the issue. The log on my server is 2024-09-10 07:55:05 while timestamp: 1725954559 is (2024-09-10T07:49:19 UTC). I believe that may be the issue.

To set some context, my background task collects some information from the body and creates a webclient and then calls apps_event_authorizations_list. I want to understand a bit better why the request signature verification happens here.

This is the code for passing the message to background tasks:

@app.event("message")
def handle_message(ack, body, say, logger, context):
    # background_tasks = BackgroundTasks()
    background_tasks = context["background_tasks"]
    background_tasks.add_task(background_handle_msg, body, logger)
    ack()
    return

and this is the code for handling the message:

def background_handle_msg(body, logger):
    try:
        if "subtype" in body["event"] and body["event"]["subtype"] not in [
            "message_changed",
            "message_deleted",
            "message_replied",
            "thread_broadcast",
        ]:
            return

        event_context = body["event_context"]
        team_id = body["team_id"]
        ts = body["event"]["ts"]
        channel = body["event"]["channel"]
        channel_type = body["event"]["channel_type"]
        channel_type = channel_types[channel_type]

        current_users = get_current_slack_users(team_id)

        start_ts, end_ts = get_ts_values(ts)
        auth_list = []
        auth_client = create_auth_client()
        resp = auth_client.apps_event_authorizations_list(
            event_context=event_context,
        )

        if resp.get("ok"):
            auth_list = resp.get("authorizations", [])
            while resp.get("response_metadata", {}).get("next_cursor", ""):
                next_cursor = resp.get("response_metadata", {}).get("next_cursor", "")
                resp = auth_client.apps_event_authorizations_list(
                    event_context=event_context,
                    cursor=next_cursor,
                )

                auth_list = auth_list + resp.get("authorizations", [])
        else:
            logger.error(f"Error in App: Slack Authorization Error-{team_id}")
            logger.error(f"{team_id}, {channel}, {channel_type}")
            return

        if auth_list:
            for auth in auth_list:
                if auth["user_id"] in current_users:
                    user_channels = redis_connection.Set(
                        "{}_{}_{}-{}".format(
                            auth["user_id"], channel_type, int(start_ts), int(end_ts)
                        )
                    )

                    user_channels.add(channel)
                    user_channels.expire(2592000)

Would really appreciate if you can take a quick look to find any obvious mistakes. Otherwise, I believe it could be because of the load on the server and I will try to tackle that.

Thank you @seratch

seratch commented 2 months ago

I don't think the request verification is done when "background_handle_msg" is called. The logic runs right after receiving a request. There may be two possible causes:

  1. Your server's system clock is not correct (I don't think this can easily happen for a server that is connected to the internet, but this is the most likely reason).
  2. Using FastAPI's BackgroundTasks this way (your approach is pretty unusual) might cause something wrong even though I don't have any evidence.

By the way, if you're running this app as a long-running process (= not Funciton-as-a-Service platform), you don't need to do such a tricky thing. For the Events API, ack() is immediately done under the hood, and your listener function does not need to worry about 3-second timeouts. Just moving everything from "background_handle_msg" to the main function should work. If you have to enable process_before_response=True for some reason, having async task execution is necessary tho.

manish001in commented 2 months ago

Yes, initially the code didn't use background tasks but the load on my server was increasing drastically causing events to be disabled. I wasn't sure if ack was working properly, so I moved it to background tasks. My current suspicion is that the server load is getting too high which is causing the service to respond late. It is a bit surprising since I haven't had any massive uptick in usage.

I will try out few things and report back if there is any other help I may need. Please keep the issue open for 2-3 days till then.

Thank you so much for the help @seratch ! I appreciate it.

github-actions[bot] commented 1 month ago

👋 It looks like this issue has been open for 30 days with no activity. We'll mark this as stale for now, and wait 10 days for an update or for further comment before closing this issue out. If you think this issue needs to be prioritized, please comment to get the thread going again! Maintainers also review issues marked as stale on a regular basis and comment or adjust status if the issue needs to be reprioritized.

github-actions[bot] commented 1 month ago

As this issue has been inactive for more than one month, we will be closing it. Thank you to all the participants! If you would like to raise a related issue, please create a new issue which includes your specific details and references this issue number.