Open lultimouomo opened 6 years ago
Please not that I get the same result if instead of terminating the program with SIGTERM I replace the shutdown hook with this code to cause the main loop to exit by pressing enter in the console:
new Thread(() -> {
try {
System.in.read();
shutDown = true;
} catch (Exception e) {
}
}).start();
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
try {
shutDownLatch.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
}));
Also note that this doesn't seem to happen with the JNI bindings
Hi, I tried your code and it works fine on my machine with latest snapshot.
Which version are you using? What is your OS, java version, ... ?
jeromq version 0.4.3, Debian sid, linux 4.14.0, openjdk version 1.8.0_181 and version 10.0.1 2018-04-17. To reliably reproduce this I need to turn off and back on the WiFi 3-4 times (though it can happen with one only sometimes)
Also happens with openjdk version 10.0.2 2018-07-17
Thanks for the details!
I still cannot reproduce it even with 0.4.3 and openjdk version "1.8.0_162". Maybe the PUB side is needed for the bug to appear. I suspect that issue will be difficult to unit test...
ØMQ sockets are supposed to perform automagic reconnection, so disconnecting and reconnecting seems redundant, unless you need it for application purposes. The connection process is also asynchronous: for most transports and socket types the connection is not performed immediately but as needed by ØMQ; a successful call to connect(String) does not mean that the connection was or could actually be established.
Last point, the latest snapshot includes heartbeating within the socket itself.
The PUB side uses libzmq version 4.1.4.
I implemented heartbeating because in the trials I experienced a hung connection (not receiving messages and not reconnecting) when connecting between SUB and PUB through a VPN; if the VPN drops the TCP keeps on living, and when the VPN comes up again it the zmq socket does not resume receiving messages. As I understand it this is the nature of TCP, and there needs to be some form of heartbeating to fix the problem.
Is the heartbeating included in the latest snapshot compatible with libzmq? Do you have pointers to some documentation on how to enable it?
... I remember the hearbeats were implemented in libzmq 4.2.x. If you have no access to the code of the PUB side it might be a deadend.
Anyhow, you can get some doc about it there: http://api.zeromq.org/4-2:zmq-setsockopt, look for ZMQ_HEARTBEAT_IVL, ZMQ_HEARTBEAT_TIMEOUT, ZMQ_HEARTBEAT_TTL.
You can also have a look at the HeartbeatsTest class. It's unit test for zmq, not for org.zeromq but it might provide some light.
The following program expects a hartbeat from a PUB socket once a second (actually waits 2 seconds for margin); if it doesn't receive it, it disconnects and reconnects the SUB socket. If I disconnect and reconnect the WiFi on the computer running it (with PUB on a separate computer) I can get the closing of the ZMQ.Context to hang forever. The logs also shows that somehow the socket connection and subscribing thinkgs it succeeds even though the network is disconnected (obviously the heartbeat is not received).
This is the log for a run, lines starting with "###" are comments about when I connected or disconnected the wifi or stopped the program.