zeromq / pyzmq

PyZMQ: Python bindings for zeromq
http://zguide.zeromq.org/py:all
BSD 3-Clause "New" or "Revised" License
3.65k stars 637 forks source link

libzmq question: ROUTER/DEALER mod restart DEALER only will cause Host unreachable Error #1899

Closed betterlch closed 10 months ago

betterlch commented 11 months ago

What pyzmq version?

22.3.0

What libzmq version?

4.3.4

Python version (and how it was installed)

Python 3.9.9 installed by pip

OS

Centos

What happened?

I hava a ROUTER sender use bind mod and one DEALER receiver use connect mod, once I restart DEALER only,msg send by ROUTER will raise zmq.error.ZMQError: Host unreachable.

How can I reconnect DEALER in ROUTER while DEALER is restarted only without restarting ROUTER and DEALER in the same time.

Code to reproduce bug

## some code can provide

ctx = zmq.Context()
send_sock = ctx.socket(zmq.ROUTER)
send_sock.bind(tcp_ip)
send_sock.setsockopt(zmq.ROUTER_MANDATORY, 1)

ctx = zmq.Context()
receive_socket = ctx.socket(zmq.DEALER)
receive_socket.connect(tcp_ip)

Traceback, if applicable

File "/data/lib64/python3.9/site-packages/zmq/sugar/socket.py", line 595, in send_multipart
    self.send(msg, SNDMORE | flags, copy=copy, track=track)
  File "/data/lib64/python3.9/site-packages/zmq/sugar/socket.py", line 547, in send
    return super(Socket, self).send(data, flags=flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 718, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 765, in zmq.backend.cython.socket.Socket.send
  File "zmq/backend/cython/socket.pyx", line 247, in zmq.backend.cython.socket._send_copy
  File "zmq/backend/cython/socket.pyx", line 242, in zmq.backend.cython.socket._send_copy
  File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Host unreachable

More info

No response

minrk commented 11 months ago

setting ROUTER_MANDATORY is asking for messages sent to an IDENTITY not currently connected to raise this error. This occurs when DEALERs disconnect. If you want a DEALER to have a consistent address, so that the reconnected DEALER will get messages intended for the original, set the socket's IDENTITY before connecting:

receive_socket.identity = b'something-stable-but-unique'
receive_socket.connect(...)

I think you'll still see these errors while the restarted socket has not yet reconnected, so you'll probably want your own retry logic at the application level as well, to retry messages failing with EHOSTUNREACH up to some limit or timeout.

betterlch commented 11 months ago

setting ROUTER_MANDATORY is asking for messages sent to an IDENTITY not currently connected to raise this error. This occurs when DEALERs disconnect. If you want a DEALER to have a consistent address, so that the reconnected DEALER will get messages intended for the original, set the socket's IDENTITY before connecting:

receive_socket.identity = b'something-stable-but-unique'
receive_socket.connect(...)

I think you'll still see these errors while the restarted socket has not yet reconnected, so you'll probably want your own retry logic at the application level as well, to retry messages failing with EHOSTUNREACH up to some limit or timeout.

Yes, I forget to set zmq.IDENTITY in DEALER, but set it can't resolve Host unreachable error while restarting DEALER only, unless I catch Host unreachable error in ROUTER and rebind send_socket, is DEALER can't reconnect ROUTER itself after restart ?

minrk commented 11 months ago

I'm not sure I quite understand the question, but 'Host unreachable' will be raised if the ROUTER ever tries to send a message while the DEALER is not connected. This is a libzmq behavior question, so you're probably better off asking on libzmq instead of pyzmq, which is specifically about the Python bindings.

ROUTER_MANDATORY doesn't behave as I'd expect when an identity is re-used, and I don't know if that's a bug or not and can't really help further (I think there's a bug, but it's not in pyzmq).

I'd recommend avoiding identity re-use, and instead using a separate registration mechanism so your dealer registers itself with a request at the application-level and the router uses a {'name': b'identity'} map to route messages.

betterlch commented 11 months ago

more detail,

A simple example, just restart recv.py after, then will not receive the msg send. But netstat -anp | grep 9002 will see tcp connect between them

# send.py
import zmq
import time

ctx = zmq.Context()
send_sock = ctx.socket(zmq.ROUTER)
send_sock.bind('tcp://127.0.0.1:9002')
# send_sock.setsockopt(zmq.ROUTER_MANDATORY, 1)  # if set, will raise "zmq.error.ZMQError: Host unreachable" immediately

while 1:
    send_sock.send_multipart([b'1', b'ok'], zmq.NOBLOCK)
    time.sleep(2)
# recv.py
import zmq

ctx = zmq.Context()
recv_sock = ctx.socket(zmq.DEALER)
recv_sock.setsockopt_string(zmq.IDENTITY, '1')
recv_sock.connect('tcp://127.0.0.1:9002')

while 1:
    print(recv_sock.recv_multipart()[0])
minrk commented 11 months ago

That sounds like a libzmq bug to me, feel free to open an issue there. There's nothing to do about it in the Python bindings.

Andrey36652 commented 10 months ago

@minrk Doesn't seem like a bug https://github.com/zeromq/libzmq/issues/4603#issuecomment-1793939782 Can you confirm?

minrk commented 10 months ago

I'm not sure what I'm confirming, if libzmq says it's not a bug, it's not a bug. I'll close here, since the libzmq report has precedence and appears to contain an explanation of how to fix it.