integrations / slack

Bring your code to the conversations you care about with GitHub's integration for Slack
https://slack.github.com/
MIT License
3.09k stars 486 forks source link

Log whenever our response to Slack takes longer than 3s #618

Open wilhelmklopp opened 6 years ago

wilhelmklopp commented 6 years ago

We're getting a number of reports of cases where slackbot posts a "Darn - that didn't work. Feel free to give it another go." message, which indicates that we're not responding to Slack within the 3000ms timeout. This likely causes parts of the Unfurl flow to not be visible such as the "Enable for this channel" step.

We should log whenever this happens so that we can get an understanding of the impact

I have a feeling that this is due the Probot 1 req/dyno/second rate limiter, so we should look at either tweaking that or increasing the number of dynos we run on.

bkeepers commented 6 years ago

1 req/dyno/second rate limiter, so we should look at either tweaking that or increasing the number of dynos we run on.

Just to clarify, Probot's throttling is 1 req/sec/token/dyno for requests to GitHub.com. Increasing dynos won't likely have any effect on this since the throttling happens within a single request. So if that request is making >3 requests to GitHub.com, it will still take longer than 3 seconds.

What's the code path that is hitting this issue?

wilhelmklopp commented 6 years ago

@bkeepers ah that's good context! I didn't realise it was per request and per token 👌

Here is the code path (the deliverUnfurl function specifically): https://github.com/integrations/slack/blob/7a9640390badedaa0520b09bb1d55d297d67cbba/lib/unfurls/show-rich-preview.js#L51

I count 1 request to the GitHub API, and 2 requests to the Slack API before the res.send.

One thing we could do is parallelise the call to storedUnfurl.deliver() and the following line https://github.com/integrations/slack/blob/7a9640390badedaa0520b09bb1d55d297d67cbba/lib/unfurls/show-rich-preview.js#L55

wilhelmklopp commented 6 years ago

Another weird part of this: I've never hit these timeouts when running locally (where latency is much much worse) and not on staging either (if the dyno is not sleeping), so I wonder what's causing this in production specifically.

stale[bot] commented 6 years ago

Is this still relevant? If so, just comment with any updates and we'll leave it open. Otherwise, if there is no further activity, it will be closed.

wilhelmklopp commented 6 years ago

Yeah, I'm still hoping to get this fixed

stale[bot] commented 5 years ago

Is this still relevant? If so, just comment with any updates and we'll leave it open. Otherwise, if there is no further activity, it will be closed.