Closed afshin closed 8 years ago
I do agree that the panic shouldn't be used in a library but in this rare case crashing fast helps to identify the bugs. Having said that a pull request is always welcome.
In terms of the bug itself, I can't seem to reproduce the bug. These are the scenarios I tested:
1) Two nodes joined to a group then after a while one of them leaves the group 2) Two nodes in the cluster but one of them leaves an arbitrary group
These two scenarios worked fine for me. Maybe I'm missing something here. It would be nice and very helpful if you could come up with a test or a spinet if the bug can be reproduced at all.
Cheers
Hi! Thanks for the response. Here is a sample small program that crashes every time for me:
https://gist.github.com/afshin/0242be8726c37144407a713f8d941815
To test it, compile it and run two instances:
./gyre-leave -seconds 10
and
./gyre-leave
The first one will leave the group after 10 seconds (or whatever number you pick) and the second one will remain running unless something goes wrong. In my case, the second one always crashes with this error:
panic: [BBD6198AABE20BD2F3FFED3652D5DB7F] message status isn't equal to peer status, 2 != 3
goroutine 5 [running]:
panic(0x606000, 0xc8200b9810)
/usr/local/go/src/runtime/panic.go:464 +0x3e6
github.com/zeromq/gyre.(*node).recvFromPeer(0xc82009e000, 0x7f33330b9740, 0xc82004ce80)
/srv/go/src/github.com/zeromq/gyre/node.go:708 +0x9d2
github.com/zeromq/gyre.(*node).actor.func4(0x1, 0x0, 0x0)
/srv/go/src/github.com/zeromq/gyre/node.go:842 +0x214
github.com/pebbe/zmq4.(*Reactor).Run(0xc82004c440, 0x989680, 0x0, 0x0)
/srv/go/src/github.com/pebbe/zmq4/reactor.go:187 +0x870
github.com/zeromq/gyre.(*node).actor(0xc82009e000)
/srv/go/src/github.com/zeromq/gyre/node.go:847 +0x312
created by github.com/zeromq/gyre.newGyre
/srv/go/src/github.com/zeromq/gyre/gyre.go:94 +0x201
Hey @afshin Thanks for the bug report and the test case. The issue has been fixed could you please double check to see if the bug has gone and if so please close the issue.
Thanks.
@armen thanks very much! I can confirm that this fixes the issue. Thank you for resolving it 👍
Please don't expose any
panic
statements even if they are used internally. They can be captured withrecover
inside the library, but they are difficult (in this case, I actually can't seem torecover
at all) to capture in clients of the library.I have two applications running on the same network. When one calls
Leave()
, it causes the other app to panic. This may be a bug, but even if it isn't, the panic cannot be recovered:https://github.com/zeromq/gyre/blob/master/node.go#L702
https://github.com/zeromq/gyre/blob/master/node.go#L708