Open tikh opened 8 years ago
This is consistent with the behavior observed with faye-websocket. We try to open a lot of connections, and the client can't process them fast enough, causing these close
events, a reconnect and finally a success. With many connections it happens a lot more often, and can throw a service with 300 teams in a loop. It's more frequent if you make the initial connection under EM.defer
, but that hits https://github.com/faye/websocket-driver-ruby/issues/38.
You definitely want to handle the close event and try to reconnect, like https://github.com/dblock/slack-ruby-bot-server/blob/8ba27e827cfcc75bf6d015e5117c12c72fa33096/lib/slack-ruby-bot-server/server.rb#L15.
I had switched to using celluloid-io for all of my projects, which doesn't do the rtm.start
connection in parallel making it a little slower to start (see #87), but works very consistently.
The start up issue was resolved after switching to celluloid-io
gem. However, I wanted to test a reconnection logic and found out that close
event never happens (at least in a few first minutes). I tried two things - turning off wi-fi on my laptop and deleting an integration in Slack. I expected to have close
event earlier as WebSockets ping interval is 30 seconds by default. Also, I tried to do active periodic pinging and got this exception:
[2016-06-06 12:03:13.276][PID=21374] ERROR Actor crashed!
Errno::ETIMEDOUT: Operation timed out
/Users/tikh/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/openssl/buffering.rb:178:in `sysread_nonblock'
/Users/tikh/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/openssl/buffering.rb:178:in `read_nonblock'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:42:in `block in sysread'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:390:in `synchronize'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:40:in `sysread'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:129:in `readpartial'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:42:in `read'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:36:in `block in run_loop'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:36:in `loop'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:36:in `run_loop'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/slack-ruby-client-0.7.2/lib/slack/real_time/concurrency/celluloid.rb:29:in `connect!'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/Users/tikh/.rvm/gems/ruby-2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
Hmmm... check out https://github.com/celluloid/celluloid-io/issues/176 and https://github.com/dblock/slack-ruby-client/issues/90, does this explain any of this?
Actually if Slack integration is deleted, 'close' event happens correctly. When I was trying to simulate connection problems by turning off Wi-Fi and had periodic pings, I got exception above. Probably, Errno::ETIMEDOUT
exceptions should be added to the rescue
block where you already handle EOFError
(https://github.com/dblock/slack-ruby-client/blob/master/lib/slack/real_time/concurrency/celluloid.rb#L38). Does it sound right? Or erroneous exceptions situations are supposed to be handle in different places?
Yes, that sounds just right! Maybe it should be a whole other range of exceptions, would appreciate some research too in other network conditions. Would love a PR!
I think another one is Errno:EPIPE
and Errno::ECONNRESET
:
E, [2016-06-07T19:05:15.066577 #154] ERROR -- : Actor crashed!
Errno::EPIPE: Broken pipe
/app/vendor/ruby-2.2.4/lib/ruby/2.2.0/openssl/buffering.rb:383:in `syswrite_nonblock'
/app/vendor/ruby-2.2.4/lib/ruby/2.2.0/openssl/buffering.rb:383:in `write_nonblock'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:62:in `block in syswrite'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:390:in `synchronize'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:59:in `syswrite'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:356:in `do_write'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-io-0.17.3/lib/celluloid/io/stream.rb:250:in `write'
/app/vendor/bundle/ruby/2.2.0/gems/slack-ruby-client-0.7.3/lib/slack/real_time/concurrency/celluloid.rb:56:in `write'
/app/vendor/bundle/ruby/2.2.0/gems/websocket-driver-0.6.3/lib/websocket/driver/hybi.rb:232:in `send_frame'
/app/vendor/bundle/ruby/2.2.0/gems/websocket-driver-0.6.3/lib/websocket/driver/hybi.rb:193:in `frame'
/app/vendor/bundle/ruby/2.2.0/gems/websocket-driver-0.6.3/lib/websocket/driver/hybi.rb:156:in `close'
/app/vendor/bundle/ruby/2.2.0/gems/slack-ruby-client-0.7.3/lib/slack/real_time/concurrency/celluloid.rb:45:in `close'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/call/sync.rb:16:in `dispatch'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/app/vendor/bundle/ruby/2.2.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
For easy repro cases, https://gist.github.com/dblock/1250a5497ff3c7af81c05964c459ff86.
Closed the wrong issue, reopening ;)
I have an issue with starting up multiple bot clients after application start up. Sometimes, "CLOSE" event is being fired for many clients and regular reconnection logic doesn't help (while it works fine in event of temporary network disconnect). "CLOSE" events have "code" value 1006. It doesn't happen every time, and it have something to do with time spent in callbacks, I suppose.
A snippet used to reproduce the issue - https://gist.github.com/tikh/50d5cf34481330759a736aa351e3ee2e. Simply adding a "sleep" to "hello" callback makes the issue happen. But even without sleep this happens sometimes.
In the example,
faye-websocket
gem was used and 300 connections to different Slack teams.