zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.71k stars 2.35k forks source link

libzmq swallows error when attempting connection to blocked port (firewall) #4465

Open ghpqans opened 1 year ago

ghpqans commented 1 year ago

Please use this template for reporting suspected bugs or requests for help.

Issue description

Environment

Minimal test code / Steps to reproduce the issue

Python example, but may be any other language.

import zmq

endpoint = "tcp://{IP}:{PORT}"   # < -- The PORT is blocked by a firewall

zmq_ctx = zmq.Context()
sock = zmq_ctx.socket(zmq.PAIR)
sock.connect(endpoint)  # <--- I would expect an exception thrown here, however this is not the case

sock.disconnect(endpoint)
sock.close()
zmq_ctx.term()  # <--- This blocks infinetely

What's the actual result?

Instead of throwing an exception the code runs into the zmq_ctx.term() and blocks infinitely, then.

What's the expected result?

Having a possibility to catch such a connection error during initialization phase not during shutdown phase.

boscosiu commented 1 year ago

It is correct behavior for zmq_connect() to not emit an error when the destination isn't immediately reachable. The idea is ZeroMQ will retry behind the scenes until a connection can be made. The zmq_connect docs describe this in more detail.

However, the more important issue here is that you are using ZMQ_PAIR with a TCP transport. This is strongly discouraged due to the purpose and nature of this socket type. From the zmq_socket docs:

While ZMQ_PAIR sockets can be used over transports other than zmq_inproc(7), their inability to auto-reconnect coupled with the fact new incoming connections will be terminated while any previous connections (including ones in a closing state) exist makes them unsuitable for TCP in most cases.

Finally, your call to terminate the context is blocking because the (unsuccessful) connection attempt is a pending operation. You would need to set the ZMQ_LINGER socket option to 0 as described in the PyZMQ docs for term().

ghpqans commented 1 year ago

I understand that reconnection attempts make sense for unstable/unreliable connections. However, if it is clear beforehand that a reconnection will fail after the first connection failed, how to deal with that. How to avoid infinite reconnection attempts? If libzmq silently attempts reconnects behind the scenes without ever stopping, how do I have a change to programmatically determine that there is a persistent connection problem and react appropriately? What I'm searching is the ability to get some info what's happening behind the scenes as well as some means to stop reconnection attempts after some timeout. Setting ZMQ_LINGER to 0 for messages lingering round is maybe a little harsh.

boscosiu commented 1 year ago

It starts getting a bit messy, but you can use the facilities provided by zmq_socket_monitor. It reports low level socket events through a separate ZMQ_PAIR.

Fortunately this is somewhat streamlined by PyZMQ's get_monitor_socket and recv_monitor_message APIs. They have an example that uses these to accomplish something similar to your use case.

ghpqans commented 1 year ago

The problem which triggered opening the issue was a blocking firewall. (Some background info: The firewall is normally configured to allow the port, but the configuration is regularly reset on windows update. If the client just hangs I have no clue about what happened. If I get an error message I can react by allowing the port again.) So how is the additional PAIR socket supposed to help in solving that issue? How would the socket monitor be able receive some msg from the server in this case? Meanwhile I tried connecting to the server using a ZMQ_CLIENT socket. It resulted in the same blocking zmq_ctx.term() as for the ZMQ_PAIR socket. I also tried zmq_ctx.destroy(linger=0) which didn't help, either.

boscosiu commented 1 year ago

So how is the additional PAIR socket supposed to help in solving that issue?

If I understand correctly, you would like to be notified if a destination is unreachable.

The get_monitor_socket/recv_monitor_message APIs notify you of events such as ZMQ_EVENT_CONNECTED.

So you would check recv_monitor_message after a period of time. Take further action if the event isn't there.

How would the socket monitor be able receive some msg from the server in this case?

The socket monitor is not communicating with the server at all. It is monitoring your own socket and notifying you via an inproc:// socket. PyZMQ helpfully abstracts this away so you can just use their API.