Closed dianexjeong closed 2 weeks ago
Hey @dianexjeong 👋 Thanks for sharing this! I caught up on #2185 and believe these errors are caused by slow or failing server responses instead of app-related settings. It's super interesting that similar errors are found with both direct axios
and client
calls.
I do also find it interesting that this is showing most often with IPs associated with AWS Tokyo 🤔 We're poking into possible connectivity issues around this for both hooks.slack.com
and regular API calls, but I don't have information to share right now.
Without having a reliable way to reproduce this debugging might be difficult, but I have a few questions and hopes:
fetch
calls - meaning duplicating the client.chat.postMessage
with a direct use of fetch
? I'm hoping to find if these errors happen for both requests or if it's just axios
specific.I understand how frustrating both slow requests and inconsistent errors are, but I'm also hoping we can find the cause, or at least a fix, for this! 🙏
I'm also wondering if updating Bolt to ^3.21.1
is possible? This latest version includes axios@1.7.3
which can use the fetch
adapter as an option. Related: slackapi/node-slack-sdk#1525.
Edit: I misread the axios
release notes! fetch
is and option for the adapter and not the default! A test replacing this in @slack/web-api@7.3.4
seems to work in an app but fails unit tests 🤔
Hello @zimeg!
Thank you so much for the response! 😄 I updated Bolt to ^3.21.1. Could you please confirm if there are any additional steps required to trigger the use of the fetch adapter after this update, or is simply updating Bolt sufficient to make this change?
Also regarding the questions:
Is this app hosted on AWS Lambda? This https://github.com/slackapi/bolt-js/issues/1643#issuecomment-1302471011 might be useful in processing responses if so!
No, I believe the app is not hosted on AWS Lambda.
Is it possible to test with shadowing fetch calls - meaning duplicating the client.chat.postMessage with a direct use of fetch? I'm hoping to find if these errors happen for both requests or if it's just axios specific.
Sure, I'll test it out and bring back the results!
Does your app make additional HTTP requests? I'm wondering if this could be caused by general connection issues that are unrelated to the Slack API.
I am calling the usageCC module mentioned in https://github.com/slackapi/bolt-js/issues/2185 - but it is called at the last step after all the client calls are executed, which means the postMessage was hanging, and the HTTP request in the usageCC was not made, therefore I'm not suspecting that the additional HTTP request was causing the warning.
@dianexjeong For sure! Thanks for sharing more about how these requests are being processed 🙏
Could you please confirm if there are any additional steps required to trigger the use of the fetch adapter after this update, or is simply updating Bolt sufficient to make this change?
I was a bit eager thinking this, the default didn't change but fetch
did become an option 😳 A few changes to @slack/web-api
and Bolt can make this an experimental option to test but I haven't yet looked into this. From this comment I'm not certain if fetch
will make a difference though 🤔
We're searching for server details around timeout issues, but I'm also curious if your app is running behind a proxy or VPN? Sometimes this can upset requests or responses without being super clear 👀
No problem @zimeg! Please let me know if you need any additional information. I would be happy to provide as much as I can 😄
Yeah, I tried refactoring the http POST request with fetch
but nothing changed, and have rolled back to the original axios
method.
I'm also curious if your app is running behind a proxy or VPN?
Nope, we are not running behind a proxy or VPN. Requests are routed through AWS's Elastic Load Balancer (ELB) and then directed to an EC2 instance hosted in the Seoul Region. The web server is running on Nginx, and the application is being directly managed by PM2 process managers.
@dianexjeong Some experimental changes are on the web-api-6.x-experimental-axios-adapter
branch if you're wanting to try this - a few build instructions are in the commit! I'm uncertain how well fetch
works in all cases but in quick testing it seems fine.
~Somehow I was redirected to the actual chat.update page in browser with one attempt, but I was also running two connections to the app~ 🤔 ~Never seen this before!~ Edit: Realized I just clicked a link from the logs 😂
The additional details are super helpful and I can't think of immediate problems with the setup, but perhaps some logs show something strange in either inbound or outbound requests?
The additional details are super helpful and I can't think of immediate problems with the setup, but perhaps some logs show something strange in either inbound or outbound requests?
@zimeg I'm not getting any logs other than the [WARN] warning
. 😞
If there's a way I can log some information that could be helpful for breaking this problem down, please let me know!
2024-08-19 14:12:41: [WARN] bolt-app http request failed connect ETIMEDOUT 35.74.58.174:443
@zimeg A quick followup - got this again.
Is there a way I can get notified whenever these warnings occur? I guess they are not categorized as 'errors', and therefore it is not triggering the errorCC
that I have created. (Right now, I am catching these errors whilst staring into the log display! 😅 )
Also, is there a way I can add the full body of the error or the full warning message to the WARN
log? I think it would be helpful if I could know which method was triggered and ran into a time out, but I'm not seeing any useful information in the one lined log.
@zimeg New update!
I ran into this error with another app that I am running.
This time, it occurred during my Axios call of chat.postMessage
.
I noticed that the ETIMEDOUT
error is occurring in all sorts of requests that my apps are sending over to the Slack's server, including a) POST to webhooks, b) Axios calls of chat.postMessage
, c) calls of client
methods within the app.
All of the error messages, regardless of the request type and request app, point to the following four IP addresses:
35.73.126.78:443
35.74.58.174:443
52.192.46.121:443
52.196.128.139:443
the above IPs belong to Slack, in the AWS, Tokyo region - as it was the same IP with the result of nslookup
of https://hooks.slack.com
.
I'm assuming that the ETIMEDOUT errors are originating from the Slack's servers. Could you look into this?
Really appreciate your help!
FWIW - I've been getting the ETIMEDOUT errors when using axios against the slack APIs lately - but only for like the past 3-4 weeks - prior to that, all have been working fine. I have about 13 IPs that seem to be hit with some overlap above.
Hello!
My Slack app running on AWS ECS(region: ap-northeast-1
) has also started to occur this error several times since the beginning of the month of August 2024.😢
It has been in operation for over a year but this error has never occurred before.
Hi, checking in here. Are you all still experiencing issues @dianexjeong @bonespiked @tomohiroyoshida ? It seems to be regional, from what I can tell. Also worth double checking to see if any proxies are in play in your situations?
Hello @filmaj! Thanks for checking in.
Yes, the timeouts are still present and nope, we are not running behind any proxies.
@tomohiroyoshida mentioned their app runs in the AWS Tokyo region, and the IP that @dianexjeong's app resolved the Slack API to also exist in the AWS Tokyo region.
@SEEWON @dianexjeong @NatsukiOka @bonespiked can you confirm which hosting provider / what region your app runs in? The reports here seem to point to low-level infrastructural issues in and around the AWS Tokyo region - want to confirm whether this is perhaps a regional problem, and knowing where your app is hosted can help confirm or disprove this theory. Thanks!
@SEEWON and I are also hosting our app in the AWS Tokyo region!
For ours, it wasn't a slack issue - someone had dorked up all our routing tables so 1/3 of requests were correct, 1/3 were being routed out of EU, and 1/3 were dead. It was quite a tirade of profanity upon my discovery of this....
Sorry to hear about that @bonespiked ! Good news that y'all figured it out though. It also partially confirms my hypothesis that this has to do with low-level networking issues - misconfigured routing tables at least make sense in terms of ETIMEDOUT
and other related network errors being raised in this situation.
For the record, we have had other customer reports come in that are very similar to this, and all are in or close to the AWS Tokyo region. We continue to work with our infrastructure team to try to get more visibility into this issue. It may even be an AWS problem - hard to say at this point. Any additional information you can provide to us is helpful (@dianexjeong) such as:
Seems relieving that it wasn't just my problem :)
To answer your questions,
Please let me know if you need any other information! I'd be glad to provide as much as I can.
👋 It looks like this issue has been open for 30 days with no activity. We'll mark this as stale for now, and wait 10 days for an update or for further comment before closing this issue out. If you think this issue needs to be prioritized, please comment to get the thread going again! Maintainers also review issues marked as stale on a regular basis and comment or adjust status if the issue needs to be reprioritized.
As this issue has been inactive for more than one month, we will be closing it. Thank you to all the participants! If you would like to raise a related issue, please create a new issue which includes your specific details and references this issue number.
@slack/bolt
versionYour
App
and Receiver ConfigurationNode.js runtime version
Expected result:
I'm experiencing VERY slow speeds when using the overall features of my app. I would be expecting fast (if not immediate) responses with my features, which would be a) installing the app / b) opening modals / c) submitting modals / d) sending messages... etc.
I have experienced a very slow functioning of my app with this specific incident: a) I submitted a modal with select options and text inputs. b) The code that is triggered is as follows:
c) I expect that the above four (1~4) steps to occur immediately, if not within a second or two.
Actual result:
In the above incident, I observed that the
1)
first step of inserting into the DB was completed immediately after the modal submission, but the rest steps2) ~4)
were not happening. It took over a minute for the rest steps to be completed, and it seemed like the app was hanging at the second step, thepostMessage
method. After a minute, the rest steps were executed almost simultaneously. The log showed a[WARN] bolt-app http request failed connect ETIMEDOUT 52.192.46.121:443
warning, which seems to be related to the incident. Here is the full log:The timestamp on the left is the time of when it occurred, and I am in the Asia/Seoul time zone. There were no other error messages left with the above log
Our team has recently raised an issue regarding the axios ETIMEDOUT errors. (#2185) This has partially been resolved by using the native
fetch
method instead of usingaxios
, as it seemed to be an error occurring from the axios itself. However, the warning message that I saw seems to be related to the bolt framework, not axios, since it doesn't have anaxios
tag to the warning.As I commented in the issue, I am experiencing VERY slow functions of my app from time to time, even for modal opens and app installations - as I am getting
expired_trigger_id
errors as well. Unfortunately, it doesn't seem to be appearing in a fixed pattern, and I am not able to reproduce the error (issue) on demand.Is this also related to how bolt is processing the events with axios? It would be a huge help if you could help us with this issue! This has been bugging our team for around a month already. :(
Thank you in advance!
Requirements
For general questions/issues about Slack API platform or its server-side, could you submit questions at https://my.slack.com/help/requests/new instead. :bow:
Please read the Contributing guidelines and Code of Conduct before creating this issue or pull request. By submitting, you are agreeing to those rules.