jwilm / chatbot

Chatbot framework in Rust
http://blog.jwilm.io/chatbot/chatbot/
MIT License
92 stars 17 forks source link

Crash in dependency #17

Open jwilm opened 9 years ago

jwilm commented 9 years ago
Received[84]: {"type":"presence_change","user":"U03FE5CRG","presence":"away"}
Received[85]: {"type":"presence_change","user":"U03FE5CRG","presence":"active"}
Received[86]: {"type":"presence_change","user":"U03A1010S","presence":"away"}
Received[87]: {"type":"presence_change","user":"U044FK97B","presence":"away"}
Received[88]: {"type":"presence_change","user":"U02AXSV3L","presence":"away"}
thread 'Chatbot Slack Receiver' panicked at 'unexpected error ErrorSyscall', /Users/jwilm/.cargo/registry/src/github.com-1ecc6299db9ec823/openssl-0.6.2/src/ssl/mod.rs:849
chatbot shutting down
emk commented 8 years ago

Got a repro here:

thread 'Chatbot Slack Receiver' panicked at 'unexpected error ErrorSyscall with ret -1', /root/.cargo/registry/src/github.com-0a35038f75765ae4/openssl-0.6.7/src/ssl/mod.rs:983
error receiving outgoing messages: receiving on a closed channel

This makes my Slack chatbot turn into a "zombie" every couple of hours, where the process is still running but it no longer receives messages from Slack. If it died cleanly, I could have an external process restart it.

Will try to take a look later.

jwilm commented 8 years ago

@emk Thanks for sharing that you've ran into the same thing. This may be as simple as reconnecting in the slack library's on_close handler which is currently a noop for us.

emk commented 8 years ago

Thank you for the great library and the quick response!

I think there's actually two separate problems here:

  1. Something's going wrong in the Slack interface code, causing a panic.
  2. When the Slack thread panics, nobody notices, causing the chatbot to turn into a zombie.

Let me take a look at (1) first.

jwilm commented 8 years ago

@emk The panic was from the openssl library which is not directly included by the slack library. It's depended on by the websockets library which slack uses. The panic issue could be caused by any of those libraries.

There was a second issue which I alluded to but now realize isn't mentioned here. A slack bot will sometimes disconnect from the server without any mention of a panic.

Thanks for looking into this stuff! I oh-so-very-much appreciate any time you can donate :smile: :star2:. I'm running a test for the latter issue I mentioned to see if on_close does give us an opportunity to reconnect in that case.

emk commented 8 years ago

I noticed we're running against an old version of the Slack adapter, so it may be worth updating first:

--- a/Cargo.toml
+++ b/Cargo.toml
@@ -18,7 +18,7 @@ path = "src/main.rs"
 regex = "~0.1"
 hyper = "~0.5"
 rustc-serialize = "~0.3"
-slack = "~0.6"
+slack = "~0.11.0"
 getopts = "~0.2"
 irc = "~0.8"
 startuppong = "~0.1"

There's always a chance this has been fixed upstream. :-)

jwilm commented 8 years ago

That sounds like the right place to start. I'm heading out for a few hours, but I will check in on this later.

emk commented 8 years ago

I tried quickly upgrading to 0.11 but I got stuck with various internal Slack errors. It's probably just a dumb mistake.

I'm going to put (1) aside for the moment, though there's a decent chance you'll be able to solve it quickly. :-)

I'm going to stare at (2) for a bit and see if there's any way to keep track of chatbot's threads, and fail completely if something goes wrong on one of the worker threads. That way, at least chatbot won't get stuck in zombie state with one or more dead threads.

emk commented 8 years ago

OK, I have a fix for issue (2) in #24. I probably won't have time to track down issue (1) for a couple of days, but at least this will prevent chatbot from winding up in a "half-crashed" state.

Thank you very much for looking into these issues!

jwilm commented 8 years ago

Just to summarize the status here:

Per your comment on thread management: I'm not sure what sort of operations are allowed while panicking, but it may be possible to have our own thread guard that posts on a channel during drop. The main thread could be the receiver and take appropriate action on receipt of the panic message. Could that work?

emk commented 8 years ago

I spoke to @eddyb on #rust yesterday, and he thought it might be reasonable to send messages on a channel from a stack guard. So yes, I think it might be possible to explore that design.

jwilm commented 8 years ago

@emk looks like thread::spawn returns a thread::JoinHandle. Calling join on the handle returns a Result which is an Err if the thread had panicked. The stack guard would only need to notify that the thread ended if we decide to use that.

thread::JoinHandle docs