Open jwilm opened 9 years ago
Got a repro here:
thread 'Chatbot Slack Receiver' panicked at 'unexpected error ErrorSyscall with ret -1', /root/.cargo/registry/src/github.com-0a35038f75765ae4/openssl-0.6.7/src/ssl/mod.rs:983
error receiving outgoing messages: receiving on a closed channel
This makes my Slack chatbot turn into a "zombie" every couple of hours, where the process is still running but it no longer receives messages from Slack. If it died cleanly, I could have an external process restart it.
Will try to take a look later.
@emk Thanks for sharing that you've ran into the same thing. This may be as simple as reconnecting in the slack library's on_close
handler which is currently a noop for us.
Thank you for the great library and the quick response!
I think there's actually two separate problems here:
Let me take a look at (1) first.
@emk The panic was from the openssl library which is not directly included by the slack library. It's depended on by the websockets library which slack uses. The panic issue could be caused by any of those libraries.
There was a second issue which I alluded to but now realize isn't mentioned here. A slack bot will sometimes disconnect from the server without any mention of a panic.
Thanks for looking into this stuff! I oh-so-very-much appreciate any time you can donate :smile: :star2:. I'm running a test for the latter issue I mentioned to see if on_close does give us an opportunity to reconnect in that case.
I noticed we're running against an old version of the Slack adapter, so it may be worth updating first:
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -18,7 +18,7 @@ path = "src/main.rs"
regex = "~0.1"
hyper = "~0.5"
rustc-serialize = "~0.3"
-slack = "~0.6"
+slack = "~0.11.0"
getopts = "~0.2"
irc = "~0.8"
startuppong = "~0.1"
There's always a chance this has been fixed upstream. :-)
That sounds like the right place to start. I'm heading out for a few hours, but I will check in on this later.
I tried quickly upgrading to 0.11 but I got stuck with various internal Slack errors. It's probably just a dumb mistake.
I'm going to put (1) aside for the moment, though there's a decent chance you'll be able to solve it quickly. :-)
I'm going to stare at (2) for a bit and see if there's any way to keep track of chatbot's threads, and fail completely if something goes wrong on one of the worker threads. That way, at least chatbot won't get stuck in zombie state with one or more dead threads.
OK, I have a fix for issue (2) in #24. I probably won't have time to track down issue (1) for a couple of days, but at least this will prevent chatbot from winding up in a "half-crashed" state.
Thank you very much for looking into these issues!
Just to summarize the status here:
Per your comment on thread management: I'm not sure what sort of operations are allowed while panicking, but it may be possible to have our own thread guard that posts on a channel during drop
. The main thread could be the receiver and take appropriate action on receipt of the panic message. Could that work?
I spoke to @eddyb on #rust yesterday, and he thought it might be reasonable to send messages on a channel from a stack guard. So yes, I think it might be possible to explore that design.
@emk looks like thread::spawn
returns a thread::JoinHandle
. Calling join
on the handle returns a Result
which is an Err
if the thread had panicked. The stack guard would only need to notify that the thread ended if we decide to use that.