Open mihirkothari25 opened 1 month ago
Hi @mihirkothari25! Thanks for raising this. A quick search internally reveals that this is not the first time that we've heard this issue from folks, but it's not immediately clear to me what the resolution is (if it even exists). I'll do a bit more digging and get back to you with more information (and hopefully a solution) next week.
I looked into this a bit and found an interesting difference in behaviour when using the example app @mihirkothari25 provided vs. a socket-mode powered app (but same handler code: post a message, sleep for 3+ seconds, call complete
). If you flip the log level to debug you may be able to also see this difference.
Here are logs for the socket mode powered app:
[DEBUG] socket-mode:SocketModeClient:0 Received a message on the WebSocket: {"envelope_id":"25484494-c354-43e9-ba12-444858037c09","payload":{"team_id":"T029V6468RL","api_app_id":"A07T2726RFW","event":{"type":"function_executed","function":{"id":"Fn07TNFP4RJ4","callback_id":"sample_step","title":"Sample step","description":"Runs sample step","type":"app","input_parameters":[{"type":"slack#\/types\/user_id","name":"user_id","description":"Message recipient","title":"User","is_required":true,"hint":"Select a user in the workspace"},{"type":"slack#\/types\/interactivity","name":"interactivity","title":"Interactivity","is_required":true}],"output_parameters":[{"type":"slack#\/types\/user_id","name":"user_id","description":"User that completed the step","title":"User","is_required":true}],"category_id":"A07T2726RFW","category_label":"Custom Function Modal Test","app_id":"A07T2726RFW","date_created":1729605175,"date_released":0,"date_updated":1729605175,"date_deleted":0,"form_enabled":false},"inputs":{"user_id":"U02AEHE4KG9","interactivity":{"interactor":{"id":"U02AEHE4KG9","secret":"Nzk0OTY1Njc5MjA4MS4yMzM1MjA4MjEyODcwLjE4YjlkNjIxMTU1M2NhODQ3ZjIwNGY0ZTRkMTk3ZWUx"},"interactivity_pointer":"7949656792081.2335208212870.18b9d6211553ca847f204f4e4d197ee1"},"interactivity.interactor":{"id":"U02AEHE4KG9","secret":"Nzk0OTY1Njc5MjA4MS4yMzM1MjA4MjEyODcwLjE4YjlkNjIxMTU1M2NhODQ3ZjIwNGY0ZTRkMTk3ZWUx"},"interactivity.interactor.id":"U02AEHE4KG9","interactivity.interactor.secret":"Nzk0OTY1Njc5MjA4MS4yMzM1MjA4MjEyODcwLjE4YjlkNjIxMTU1M2NhODQ3ZjIwNGY0ZTRkMTk3ZWUx","interactivity.interactivity_pointer":"7949656792081.2335208212870.18b9d6211553ca847f204f4e4d197ee1"},"function_execution_id":"Fx07TJUPC3T4","workflow_execution_id":"Wx07TG3F4Z61","event_ts":"1729871460.565880"},"type":"event_callback","event_id":"Ev07TJUPCJ58","event_time":1729871460},"type":"events_api","accepts_response_payload":false,"retry_attempt":0,"retry_reason":""}
[DEBUG] socket-mode:SocketModeClient:0 Calling ack() - type: events_api, envelope_id: 25484494-c354-43e9-ba12-444858037c09, data: undefined
[DEBUG] socket-mode:SocketModeClient:0 send() method was called (WebSocket state: OPEN)
[DEBUG] socket-mode:SlackWebSocket:1 isActive(): websocket ready state is OPEN
[DEBUG] socket-mode:SocketModeClient:0 Sending a WebSocket message: {"envelope_id":"25484494-c354-43e9-ba12-444858037c09","payload":{}}
[DEBUG] web-api:WebClient:2 initialized
[DEBUG] bolt-app No conversation ID for incoming event
[DEBUG] web-api:WebClient:3 initialized
[DEBUG] web-api:WebClient:3 apiCall('chat.postMessage') start
[DEBUG] web-api:WebClient:3 http request url: https://slack.com/api/chat.postMessage
[DEBUG] web-api:WebClient:3 http request body: {"channel":"U02AEHE4KG9","text":"custom function has started yo"}
[DEBUG] web-api:WebClient:3 http request headers: {"Accept":"application/json, text/plain, */*","User-Agent":"@slack:socket-mode/2.0.2 @slack:bolt/4.0.1 @slack:web-api/7.6.0 node/18.15.0 darwin/23.6.0","Authorization":"[[REDACTED]]"}
[DEBUG] web-api:WebClient:3 http response received
[DEBUG] web-api:WebClient:3 http request result: {"ok":true,"channel":"D07SWRZHNQM","ts":"1729871460.767479","message":{"user":"U07SZK61TFV","type":"message","ts":"1729871460.767479","bot_id":"B07TCCFMA6M","app_id":"A07T2726RFW","text":"custom function has started yo","team":"T029V6468RL","bot_profile":{"id":"B07TCCFMA6M","app_id":"A07T2726RFW","name":"Custom Function Modal Test","icons":{"image_36":"https://a.slack-edge.com/80588/img/plugins/app/bot_36.png","image_48":"https://a.slack-edge.com/80588/img/plugins/app/bot_48.png","image_72":"https://a.slack-edge.com/80588/img/plugins/app/service_72.png"},"deleted":false,"updated":1729605216,"team_id":"T029V6468RL"},"blocks":[{"type":"rich_text","block_id":"pUbD","elements":[{"type":"rich_text_section","elements":[{"type":"text","text":"custom function has started yo"}]}]}]},"response_metadata":{"scopes":["chat:write"],"acceptedScopes":["chat:write"]}}
[DEBUG] web-api:WebClient:3 apiCall('chat.postMessage') end
[DEBUG] web-api:WebClient:3 apiCall('functions.completeSuccess') start
[DEBUG] web-api:WebClient:3 http request url: https://slack.com/api/functions.completeSuccess
[DEBUG] web-api:WebClient:3 http request body: {"token":"[[REDACTED]]","outputs":"{\"user_id\":\"U02AEHE4KG9\"}","function_execution_id":"Fx07TJUPC3T4"}
[DEBUG] web-api:WebClient:3 http request headers: {"Accept":"application/json, text/plain, */*","User-Agent":"@slack:socket-mode/2.0.2 @slack:bolt/4.0.1 @slack:web-api/7.6.0 node/18.15.0 darwin/23.6.0","Authorization":"[[REDACTED]]"}
[DEBUG] web-api:WebClient:3 http response received
[DEBUG] web-api:WebClient:3 http request result: {"ok":true,"response_metadata":{"scopes":["chat:write"]}}
[DEBUG] web-api:WebClient:3 apiCall('functions.completeSuccess') end
And here are the logs for the AwsLambdaReceiver-powered sample that the OP provided:
POST /slack/events (λ: slack)
[DEBUG] web-api:WebClient:1 initialized
[DEBUG] bolt-app No conversation ID for incoming event
[DEBUG] web-api:WebClient:2 initialized
[DEBUG] web-api:WebClient:2 apiCall('chat.postMessage') start
[DEBUG] web-api:WebClient:2 http request url: https://slack.com/api/chat.postMessage
[DEBUG] web-api:WebClient:2 http request body: {"channel":"U02AEHE4KG9","text":"custom function has started yo"}
[DEBUG] web-api:WebClient:2 http request headers: {}
[DEBUG] web-api:WebClient:2 http response received
[DEBUG] web-api:WebClient:2 http request result: {"ok":true,"channel":"D07SWRZHNQM","ts":"1729872308.257559","message":{"user":"U07SZK61TFV","type":"message","ts":"1729872308.257559","bot_id":"B07TCCFMA6M","app_id":"A07T2726RFW","text":"custom function has started yo","team":"T029V6468RL","bot_profile":{"id":"B07TCCFMA6M","app_id":"A07T2726RFW","name":"Custom Function Modal Test","icons":{"image_36":"https://a.slack-edge.com/80588/img/plugins/app/bot_36.png","image_48":"https://a.slack-edge.com/80588/img/plugins/app/bot_48.png","image_72":"https://a.slack-edge.com/80588/img/plugins/app/service_72.png"},"deleted":false,"updated":1729605216,"team_id":"T029V6468RL"},"blocks":[{"type":"rich_text","block_id":"zm0e","elements":[{"type":"rich_text_section","elements":[{"type":"text","text":"custom function has started yo"}]}]}]},"response_metadata":{"scopes":["chat:write"],"acceptedScopes":["chat:write"]}}
[DEBUG] web-api:WebClient:2 apiCall('chat.postMessage') end
[DEBUG] web-api:WebClient:2 apiCall('functions.completeSuccess') start
[DEBUG] web-api:WebClient:2 http request url: https://slack.com/api/functions.completeSuccess
[DEBUG] web-api:WebClient:2 http request body: {"token":"[[REDACTED]]","outputs":"{\"user_id\":\"U02AEHE4KG9\"}","function_execution_id":"Fx07TMGPUZ5J"}
[DEBUG] web-api:WebClient:2 http request headers: {"Authorization":"[[REDACTED]]"}
[DEBUG] web-api:WebClient:2 http response received
Error: An API error occurred: token_revoked
at platformErrorFromResult (/Users/fmaj/src/bolt-js-custom-step-template/node_modules/@slack/web-api/dist/errors.js:62:33)
at WebClient.apiCall (/Users/fmaj/src/bolt-js-custom-step-template/node_modules/@slack/web-api/dist/WebClient.js:181:56)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /Users/fmaj/src/bolt-js-custom-step-template/app.js:30:5
at async processFunctionMiddleware (/Users/fmaj/src/bolt-js-custom-step-template/node_modules/@slack/bolt/dist/CustomFunction.js:107:9)
at async Array.<anonymous> (/Users/fmaj/src/bolt-js-custom-step-template/node_modules/@slack/bolt/dist/conversation-store.js:67:9)
at async Array.<anonymous> (/Users/fmaj/src/bolt-js-custom-step-template/node_modules/@slack/bolt/dist/middleware/builtin.js:282:9)
at async App.processEvent (/Users/fmaj/src/bolt-js-custom-step-template/node_modules/@slack/bolt/dist/App.js:598:13)
at async /Users/fmaj/src/bolt-js-custom-step-template/node_modules/@slack/bolt/dist/receivers/AwsLambdaReceiver.js:122:17
at async MessagePort.<anonymous> (file:///Users/fmaj/src/bolt-js-custom-step-template/node_modules/serverless-offline/src/lambda/handler-runner/worker-thread-runner/workerThreadHelper.js:24:14) {
code: 'slack_webapi_platform_error',
data: { ok: false, error: 'token_revoked', response_metadata: {} }
}
Even if I remove the call complete
in the Lambda receiver case, I still get a message from Slackbot telling me the function did not execute. I think this may be a bug... I still need to investigate some more, but I think the main difference between the Lambda receiver code vs. the socket mode code is that the socket mode code immediately acknowledges the event (you can see this in the socket mode debug logs in the first few lines). The Lambda receiver code, however, does not. This may be a bug in the custom step + Lambda receiver case. Will continue looking into it and report back.
Actually now that I think of it, NOT acknowledging the HTTP-based event in the Lambda receiver is by design. This is because acknowledging the HTTP event means returning an HTTP response to Slack's HTTP POST to the app. Doing so in the context of a Lambda means that this terminates the Lambda execution - effectively the lifecycle of the Lambda function is complete as soon as we send the HTTP response.
This is not a problem with Bolt so much as a standard challenge when building applications in a serverless environment. Such an application needs to change how it is built and use a different mental model and architecture. There are two constraints your app needs to work with:
Given these two constraints, you need to separate your application logic in a way that is different from e.g. a single long-running process powering Bolt and able to handle multiple incoming HTTP requests in parallel. My suggestion would be split up your functionality into discrete, separate Lambdas, each with their own lifecycle and execution. The following architecture should work:
app.function
handler has a small scope and few responsibilities in order to be able to acknowledge the incoming HTTP request from Slack within 3 seconds. For example, perhaps all it does is extract the inputs from the function event payload and invokes a different Lambda in an asynchronous/decoupled way that has no constraints of execution time, passing the event data to the 'worker' Lambda. Having a small set of responsibilities like this would almost guarantee that your main function-handler Lambda always executes within 3 seconds (satisfying the Slack event HTTP dispatch response time constraint).functions.completeSuccess
HTTP API to complete the custom step and have the housing workflow move on to the next step (this is the API that the complete
handler within Bolt function
handlers call; see this Bolt code for how the complete
utility method is created).Hope this helps.
@filmaj Hey, thank you for responding. Based on my reading of your response, are these assumptions valid -
complete
is called in the lambda function, it has to be less than or equal to 3 seconds.app.function
callback function doesn't have an ack
passed to it, so even if the lambda responds with a post_message within 3 seconds, the complete has to occur within 3 seconds. Is that correct?Do you know how the handler lambda pass the slack client to the worker lambda? I am concerned about going down that route of breaking up the lambda into a handler lambda and a worker lambda, and still encountering the token_revoked
error because the same token would be passed from the handler lambda to the worker lambda.
The lambda function first invoked by Slack (via API Gateway) has to end its execution in 3 seconds.
This is correct, but it's important to understand why: because the HTTP request spawned by Slack is tied to the lambda invocation life. When your lambda finishes, you will be sending a response back to the Slack HTTP Request. Slack requires you to send your HTTP response within 3 seconds.
If the time from when the request started until the first
complete
is called in the lambda function, it has to be less than or equal to 3 seconds.
This is incorrect. complete
, which is the same as the functions.completeSuccess
HTTP API (complete
within Bolt is nothing more than a wrapper for the functions.completeSuccess
HTTP API), has no requirement on when it must be called. complete
/ functions.completeSuccess
can be called days, weeks, in theory months after the function executes. The use case we use internally as a thought experiment to help understand the workflow system is: consider a custom function that requires some kind of manager approval, that DMs your manager with approve/deny buttons in a message. Clicking either of the buttons completes the function. If the manager is off on PTO, it could be weeks before they click the button and complete the function.
The app.function callback function doesn't have an ack passed to it, so even if the lambda responds with a post_message within 3 seconds, the complete has to occur within 3 seconds. Is that correct?
Incorrect, due to the same difference I mentioned in the previous point. The event payload delivered from Slack needs to be responded to within 3 seconds. That response to the Slack HTTP request is different / a completely separate consideration from calling the functions.completeSuccess
API signaling that your custom function has completed and can now provide outputs to the next step in the housing workflow.
I created a sample custom function to demonstrate the issue we are seeing - https://github.com/mihirkothari25/bolt-js-getting-started-app. It's adapted from the sample starter app. I run ngrok and serverless-offline to run the app locally and point the event subscription URL in the slack app to the ngrok URL. I see that even after responding immediately from the function if the next interaction with Slack takes more than 3 seconds, the token_revoked error is thrown. If an API call takes longer than 3 seconds after the function has responded the first time, there will be a
token_revoked
error on the next complete or whatever slack function is being called.In the real world scenario -
/slack/events
endpoint configured in API Gateway.complete
, and it results in atoken_revoked
error.To mitigate this issue, we tried to immediately respond after the lambda function starts execution. As soon as the lambda function starts executing, it would post an ephemeral message back to the user however, when the AWS Lambda cold starts, that takes too long, and the same error is encountered. If the AWS Lambda is warm, the error is less frequent.
My sample app impersonates the API call by just doing a sleep timeout. It also impersonates the AWS infrastructure by using serverless-offline.
@slack/bolt
version3.22.0
Your
App
and Receiver ConfigurationLink to Application code - https://github.com/mihirkothari25/bolt-js-getting-started-app/blob/main/app.js
Using the
AwsLambdaReceiver
Node.js runtime version
v21.7.3
Steps to reproduce:
https://github.com/mihirkothari25/bolt-js-getting-started-app/tree/main - forked and adapted starter app. Run
npm install
to install dependencies.ngrok http 3000
npm run serverless
Expected result:
Slack should respond initially that it is executing the request and then respond with the result once it is done.
Actual result:
If the sleep is longer than 3 seconds, the function ends with the token_revoked error. This is despite responding immediately from the function and then sleeping for longer than 3 seconds.
Is the expectation that the function should keep pushing heartbeats to slack while it is finishing the execution of underlying API calls?
These are the errors -