Closed milestruecar closed 1 year ago
while fiddling with this to try to find some short term solution, i set a higher websocket_ping time but noticed that even when we aren't getting rate-limited, socket.time_since_last_message
just keeps going up. sample output from local branch with some extra logging thrown in
[2023-06-26T16:05:30.778-07:00] INFO: POST https://slack.com/api/conversations.list
[2023-06-26T16:05:30.778-07:00] INFO: Accept: "application/json; charset=utf-8"
User-Agent: "Slack Ruby Client/2.1.1"
Authorization: [token]
Content-Type: "application/x-www-form-urlencoded"
[2023-06-26T16:05:30.864-07:00] INFO: @socket connected? true
[2023-06-26T16:05:30.864-07:00] INFO: time since last message: 749.5803269999997
[2023-06-26T16:05:30.864-07:00] INFO: websocket ping: 300
[2023-06-26T16:05:30.864-07:00] WARN: is offline
[2023-06-26T16:05:30.865-07:00] INFO: POST https://slack.com/api/rtm.connect
[2023-06-26T16:05:30.865-07:00] INFO: Accept: "application/json; charset=utf-8"
User-Agent: "Slack Ruby Client/2.1.1"
Authorization: [token]
[2023-06-26T16:05:31.117-07:00] INFO: Status 200
This is quite interesting, thank you for digging deep into this.
The RTM API tracks messages from slack, so socket.time_since_last_message
will just keep going up until you get something meaningful from Slack on the RTM connection. Is building the cache blocking those messages? I think that's the real bug and a proper fix would be to ensure that the local cache is built asynchronously and isn't blocking messages. As an experiment throw a very long sleep instead of this code that makes requests and see the same behavior?
I think https://github.com/slack-ruby/slack-ruby-client/issues/237 is related.
yeah same behavior with long sleep, cache build is blocking
opened a PR, seemed to work out well locally. unblocks the socket.time_since_last_message at least. Not sure if there are any potential side effects but seemed pretty straightforward.
@milestruecar Nice. Let's finish the work in the PR.
https://github.com/slack-ruby/slack-ruby-client/blob/bf5bb4be3e03e9bd74d22942a61d5611f5c1d997/lib/slack/real_time/stores/store.rb#L328-L378
on initial connect, while cycling through users and conversations in this
hello
method, we're getting rate-limited and receiving 429 responses from slack. it looks like there's retry logic somewhere that's working, but somewhere along the way, before the cache actually completes (but never in a consistent spot), the connection just kinda dies, followed by another rtm.connect and another attempt to build the cache, which leads to more rate limiting, then another disconnect/reconnect, and so on forever.after messing with the code locally, in this
keep_alive?
method https://github.com/slack-ruby/slack-ruby-client/blob/bf5bb4be3e03e9bd74d22942a61d5611f5c1d997/lib/slack/real_time/client.rb#L100-L116 while we're getting 429ed, the@socket.time_since_last_message
eventually grows beyond(websocket_ping * 2)
and knocks everything over. After adding some sleep calls to the cache build to try to mitigate the 429s and then disabling that keep_alive? entirely, the cache finished building and the app finally went into a steady state.i had some ideas for possible fixes (longer timeout threshold if the last response was a 429 which i'm not entirely sure how to interrogate inside that
keep_alive?
method, just disablingkeep_alive?
until after the cache finishes which seems... not ideal, or adding sleeps to the user and conversation loops inhello
method which would probably be silly for people with smaller slack footprints) but none of them seem great, and i'm not exactly a socket expert. so i wanted some guidance on what would make sense to y'all before i put together a PR