Getting lots of 429 rate limit errors on https://slack.com/api/rtm.start calls

LouisKottmann commented 4 years ago

Hello,

I have been using this project for a while, and switched a couple months ago to async-websocket as recommended (for better reliability than Celluloid, which we had issues with indeed).

Recently, I have been getting more and more of these in the logs while the bot is idle:

...
POST https://slack.com/api/rtm.start
Status 429
Status 429
Status 429
Status 429
Status 429
Status 429
Status 429
Status 429
Status 429
is offline
POST https://slack.com/api/rtm.start
Status 429
Status 429
is offline
POST https://slack.com/api/rtm.start
is offline
...

My Gemfile has:

source "https://rubygems.org"

gem 'slack-ruby-bot'
gem 'puma'
gem 'async-websocket', '~> 0.8.0'

My bot is defined like this:

module LouisBot
  class Bot < SlackRubyBot::Bot
    help do
      title 'title'
      desc 'desc'
    end
  end
end

And I start the bot like this:

require 'yaml'
require 'json'
require 'securerandom'
require 'open3'
require 'async'
require 'slack-ruby-bot'

LouisBot::Bot.run

What could cause the rate limiting errors? Am I the only one with this issue?

I use this code to create a client for background tasks:

client = SlackRubyBot::Client.new(token: ENV['SLACK_API_TOKEN'])
client.start_async

Maybe this is problematic?

dblock commented 4 years ago

What's the stack for that status? I would get a backtrace first for when this happens.

Compare with https://github.com/slack-ruby/slack-ruby-bot-server/blob/0b8bbbba45a2b07f35b223b4c18ec86888fc0f49/lib/slack-ruby-bot-server/service.rb, you might be accidentally looping very quickly when you get disconnected and need to reconnect, or something like that.

LouisKottmann commented 4 years ago

How do you suggest I get a backtrace?

I have fixed my code to call stop! after I'm done with a new client, maybe that will solve the issue. Maybe the clients never got GC'd and I'd increasingly had more and more clients connecting.

dblock commented 4 years ago

How do you suggest I get a backtrace?

Add a begin/rescue around start_async to begin with, but otherwise find where that message is emitted from and walk backwards from there to see what calls it, etc.

I have fixed my code to call stop! after I'm done with a new client, maybe that will solve the issue. Maybe the clients never got GC'd and I'd increasingly had more and more clients connecting.

That's possible.

LouisKottmann commented 4 years ago

I will do that if I see the error again, and close this ticket otherwise. Thank you for your time.

LouisKottmann commented 4 years ago

The error did not appear again, so it was me calling an extra:

client = SlackRubyBot::Client.new(token: ENV['SLACK_API_TOKEN'])
client.start_async

for background jobs, and letting it getting GC'd witout further action. This would make clients pile up and eventually getting rate limited when they all tried reconnecting.

While in fact I should have called this when I was done with temporary clients:

client.stop!

I now use an ensure block like so:

def self.with_client_and_data_or_new(client, data)
  if client.nil? || data.nil?
    begin
      client  = self.get_new_client
      channel = self.get_channel_by_name(client, 'any_chan_name')
      data    = self.wrap_in_data_object(channel: channel.id,
                                         user: MY_BOT_USERID)
      yield(client, data)
    ensure
      client.stop!
      client = nil
    end
  else
    yield(client, data)
  end
end

private
def self.get_channel_by_name(client, channel_name)
  client
    &.channels
    &.select { |k, chan| chan['name'] == channel_name }
    &.map { |id, info| info }
    &.first
end

def self.wrap_in_data_object(channel: nil, user: nil)
  Hashie::Mash.new({ channel: channel, user: user})
end

def self.get_new_client
  client = SlackRubyBot::Client.new(token: ENV['SLACK_API_TOKEN'])
  client.start_async

  10.times do
    break if client.started?
    sleep 1
  end

  sleep 1

  client
end

that I use like this:

with_client_and_data_or_new(client, data) do |c, d|
  # do something with c(lient) & d(ata)
end

Is that a good approach or did I miss something obvious?

dblock commented 4 years ago

Does your background job need an RTM client that's listening on Slack events? If you're trying to call some methods like listing channels, instantiate a Slack::Web::Client.new(token: ...) and call whatever you need. There will be nothing to shutdown.

LouisKottmann commented 4 years ago

Right, that looks like the obvious way to do it. Thank you.

slack-ruby / slack-ruby-bot

Getting lots of 429 rate limit errors on https://slack.com/api/rtm.start calls #242