slack-ruby / slack-ruby-bot

The easiest way to write a Slack bot in Ruby.
MIT License
1.12k stars 187 forks source link

Bot/Celluloid only connects 50% of the time #217

Open boncey opened 5 years ago

boncey commented 5 years ago

My bot doesn't seem to connect to Slack around 50% of the time when I start it up. I get these lines every time...

request: POST https://slack.com/api/rtm.start
response: Status 200

But I only get this...

Successfully connected team XXXXXXX (XXXXXXX) to https://XXXXXXX.slack.com.

around half the time.

Here's some logging output.

Success...

I, [2019-03-27T19:05:32.850641 #22933]  INFO -- request: POST https://slack.com/api/rtm.start
D, [2019-03-27T19:05:32.850858 #22933] DEBUG -- request: Accept: "application/json; charset=utf-8"
User-Agent: "Slack Ruby Client/0.14.1"
Content-Type: "application/x-www-form-urlencoded"
I, [2019-03-27T19:05:33.080719 #22933]  INFO -- response: Status 200
D, [2019-03-27T19:05:33.081056 #22933] DEBUG -- response: content-type: "application/json; charset=utf-8"
content-length: "7379"
connection: "close"
date: "Wed, 27 Mar 2019 19:05:32 GMT"
server: "Apache"
expires: "Mon, 26 Jul 1997 05:00:00 GMT"
x-accepted-oauth-scopes: "rtm:stream,client"
x-content-type-options: "nosniff"
cache-control: "private, no-cache, no-store, must-revalidate"
x-oauth-scopes: "identify,read,post,client,apps"
access-control-expose-headers: "x-slack-req-id"
x-xss-protection: "0"
x-slack-req-id: "a7e8bef3-ed62-49c1-8495-d0abf386d5fd"
vary: "Accept-Encoding"
pragma: "no-cache"
access-control-allow-headers: "slack-route, x-slack-version-ts"
strict-transport-security: "max-age=31536000; includeSubDomains; preload"
referrer-policy: "no-referrer"
access-control-allow-origin: "*"
x-via: "haproxy-www-3o80"
x-cache: "Miss from cloudfront"
via: "1.1 08efbf25f39cc444a2f013fdf0a017cc.cloudfront.net (CloudFront)"
x-amz-cf-id: "2Vqtz-Dv9k6iK4fEUfXHxf-JuANv_Tgc6fzNGU5NFGxY3NXwQK-6SA=="
D, [2019-03-27T19:05:33.131482 #22933] DEBUG -- Slack::RealTime::Concurrency::Celluloid::Socket#connect!: WebSocket::Driver::Client
D, [2019-03-27T19:05:33.408505 #22933] DEBUG -- Slack::RealTime::Concurrency::Celluloid::Socket#write: GET /websocket/dAs6IhfWzsZRBuzI1Ech53ZYD6h8ZnvBeked_2dlhftUFtgqWr4aJqIEp-QT-72cX_tpArB05_bnCrhac5dl4diAhQ_-1K1bkcqgyhDvZUI= HTTP/1.1
Host: cerberus-xxxx.lb.slack-msgs.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: DyEfhIrF5WC6KgyfLcWk4Q==
Sec-WebSocket-Version: 13

D, [2019-03-27T19:05:33.493841 #22933] DEBUG -- Slack::RealTime::Concurrency::Celluloid::Socket#handle_read: HTTP/1.1 101 Switching Protocols
upgrade: websocket
connection: upgrade
sec-websocket-accept: 3ICQB3ks/+XMUGkBdMNtzNzi6EI=
X-Via: haproxy-lb-mp-lbcert-iad-q9r9

D, [2019-03-27T19:05:33.500763 #22933] DEBUG -- SlackRubyBot::Client#run_loop: WebSocket::Driver::OpenEvent
D, [2019-03-27T19:05:33.501570 #22933] DEBUG -- Slack::RealTime::Concurrency::Celluloid::Socket#handle_read: {"type": "hello"}
D, [2019-03-27T19:05:33.503971 #22933] DEBUG -- SlackRubyBot::Client#run_loop: WebSocket::Driver::MessageEvent, {"type": "hello"}
D, [2019-03-27T19:05:33.504300 #22933] DEBUG -- SlackRubyBot::Client#dispatch: type=hello
I, [2019-03-27T19:05:33.504531 #22933]  INFO -- : Successfully connected team XXXXXXX (XXXXXXX) to https://XXXXXXX.slack.com.

Failure...

I, [2019-03-27T19:05:52.011383 #23127]  INFO -- request: POST https://slack.com/api/rtm.start
D, [2019-03-27T19:05:52.011610 #23127] DEBUG -- request: Accept: "application/json; charset=utf-8"
User-Agent: "Slack Ruby Client/0.14.1"
Content-Type: "application/x-www-form-urlencoded"
I, [2019-03-27T19:05:53.000160 #23127]  INFO -- response: Status 200
D, [2019-03-27T19:05:53.000538 #23127] DEBUG -- response: content-type: "application/json; charset=utf-8"
content-length: "7379"
connection: "close"
date: "Wed, 27 Mar 2019 19:05:52 GMT"
server: "Apache"
expires: "Mon, 26 Jul 1997 05:00:00 GMT"
x-accepted-oauth-scopes: "rtm:stream,client"
x-content-type-options: "nosniff"
cache-control: "private, no-cache, no-store, must-revalidate"
x-oauth-scopes: "identify,read,post,client,apps"
access-control-expose-headers: "x-slack-req-id"
x-xss-protection: "0"
x-slack-req-id: "d7ece95e-c146-4c1f-bba5-eced67b53cdb"
vary: "Accept-Encoding"
pragma: "no-cache"
access-control-allow-headers: "slack-route, x-slack-version-ts"
strict-transport-security: "max-age=31536000; includeSubDomains; preload"
referrer-policy: "no-referrer"
access-control-allow-origin: "*"
x-via: "haproxy-www-26um"
x-cache: "Miss from cloudfront"
via: "1.1 eaeb02896a2d60cfeb39bf4d1e8e5932.cloudfront.net (CloudFront)"
x-amz-cf-id: "LGE0n_COojQdUerY5W5tdFSPoLogxFI0j_2YKulGKPqj1dJa_TA4Tg=="
D, [2019-03-27T19:05:53.056372 #23127] DEBUG -- Slack::RealTime::Concurrency::Celluloid::Socket#connect!: WebSocket::Driver::Client

My config.ru start up code looks like this...

Thread.abort_on_exception = false
Thread.report_on_exception = false
SlackRubyBot::Client.logger.level = Logger::DEBUG

Thread.new do
  SlackXXXXXbot::Bot.run
rescue => e
  STDERR.puts "ERROR: #{e}"
  STDERR.puts e.backtrace
  raise e
end

run SlackXXXXXbot::Web

I'm not going via a proxy and as far as I can tell I have no network issues (the box lives in a data centre with excellent connectivity). I'm running the latest version.

My question is, is there a way to detect when this happens and try again? Also, is there something I can add to my app's healthcheck that tests my connectivity so I can try to reconnect?

dblock commented 5 years ago

This seems strange, and certainly not observed elsewhere.

Could you try switching to async-websocket from Celluloid? Would like to learn whether this is an asynchrony problem or a networking one.

boncey commented 5 years ago

I did that (I just changed the entry in the Gemfile) and it seems to connect every time now. I just did it 7 times in a row and it connected every time.

dblock commented 5 years ago

Ok, which means we have a real bug with Celluloid in this case. I'll leave this open, if you have time would love it if you could spend some more time debugging/adding log code/try/rescue, maybe we're not seeing an exception after the websocket connection is opened or something of that kind...

dblock commented 5 years ago

And if you are generous with your time, I would confirm whether this issue exists with EventMachine and with Celluloid on older versions of slack-ruby-client (especially in 0.13.x, 0.12.x).