Open ghpqans opened 1 year ago
It is correct behavior for zmq_connect()
to not emit an error when the destination isn't immediately reachable. The idea is ZeroMQ will retry behind the scenes until a connection can be made. The zmq_connect docs describe this in more detail.
However, the more important issue here is that you are using ZMQ_PAIR
with a TCP transport. This is strongly discouraged due to the purpose and nature of this socket type. From the zmq_socket docs:
While ZMQ_PAIR sockets can be used over transports other than zmq_inproc(7), their inability to auto-reconnect coupled with the fact new incoming connections will be terminated while any previous connections (including ones in a closing state) exist makes them unsuitable for TCP in most cases.
Finally, your call to terminate the context is blocking because the (unsuccessful) connection attempt is a pending operation. You would need to set the ZMQ_LINGER
socket option to 0
as described in the PyZMQ docs for term().
I understand that reconnection attempts make sense for unstable/unreliable connections. However, if it is clear beforehand that a reconnection will fail after the first connection failed, how to deal with that. How to avoid infinite reconnection attempts? If libzmq
silently attempts reconnects behind the scenes without ever stopping, how do I have a change to programmatically determine that there is a persistent connection problem and react appropriately?
What I'm searching is the ability to get some info what's happening behind the scenes as well as some means to stop reconnection attempts after some timeout. Setting ZMQ_LINGER
to 0
for messages lingering round is maybe a little harsh.
It starts getting a bit messy, but you can use the facilities provided by zmq_socket_monitor. It reports low level socket events through a separate ZMQ_PAIR
.
Fortunately this is somewhat streamlined by PyZMQ's get_monitor_socket and recv_monitor_message APIs. They have an example that uses these to accomplish something similar to your use case.
The problem which triggered opening the issue was a blocking firewall. (Some background info: The firewall is normally configured to allow the port, but the configuration is regularly reset on windows update. If the client just hangs I have no clue about what happened. If I get an error message I can react by allowing the port again.)
So how is the additional PAIR socket supposed to help in solving that issue? How would the socket monitor be able receive some msg from the server in this case?
Meanwhile I tried connecting to the server using a ZMQ_CLIENT
socket. It resulted in the same blocking zmq_ctx.term()
as for the ZMQ_PAIR
socket. I also tried zmq_ctx.destroy(linger=0)
which didn't help, either.
So how is the additional PAIR socket supposed to help in solving that issue?
If I understand correctly, you would like to be notified if a destination is unreachable.
The get_monitor_socket/recv_monitor_message APIs notify you of events such as ZMQ_EVENT_CONNECTED
.
So you would check recv_monitor_message after a period of time. Take further action if the event isn't there.
How would the socket monitor be able receive some msg from the server in this case?
The socket monitor is not communicating with the server at all. It is monitoring your own socket and notifying you via an inproc://
socket. PyZMQ helpfully abstracts this away so you can just use their API.
Please use this template for reporting suspected bugs or requests for help.
Issue description
The following code swallows a connection error, if the firewall on the server-side blocks the port the client connects to.
Workaround: Attempt telnet connection before attempting zmq connection on the same endpoint.
Environment
libzmq version: libzmq5/focal,now 4.3.2-2ubuntu1 amd64
OS: Linux, Ubuntu 20.04.5 LTS
Minimal test code / Steps to reproduce the issue
Python example, but may be any other language.
What's the actual result?
Instead of throwing an exception the code runs into the
zmq_ctx.term()
and blocks infinitely, then.What's the expected result?
Having a possibility to catch such a connection error during initialization phase not during shutdown phase.