zeromq / pyzmq

PyZMQ: Python bindings for zeromq
http://zguide.zeromq.org/py:all
BSD 3-Clause "New" or "Revised" License
3.67k stars 637 forks source link

Hope more detailed hint when "Address already in use" error happens in bind or connect #1862

Closed monchin closed 1 year ago

monchin commented 1 year ago

I use zmq a lot in my project, with many processings and many ports. Sometimes I don't close my program completely, and when I reopen my program, "Address already in use" error would happen, but because I store all ports in a dict, e.g.,

skt.bind(f"tcp://*:{port_dict['port_A']}")

In this situation, it is very hard to know which port is in use even there is traceback. So I have to

try:
    skt.bind(addr)
except zmq.error.ZMQError as e:
    raise RuntimeError(f"Failed to bind to {addr} because of {str(e)}")

or I must lsof -i and traverse every possible port.

But as I need to write like this in every bind/connect place, it is very verbose. So I think, if it is good to modify bind and connect in zmq/sugar/socket.py > Socket like this way:

class Socket(SocketBase, AttributeSetter):
    def bind(self, addr):
        try:
            super().bind(addr)
            ret = self._bind_cm(addr)
        except Exception as e:
            raise RuntimeError(f"Failed to bind to {addr} because of {str(e)}")
        return ret

    def connect(self, addr):
        try:
            super().connect(addr)
            ret = self._connect_cm(addr)
        except Exception as e:
            raise RuntimeError(f"Failed to connect to {addr} because of {str(e)}")
        return ret
minrk commented 1 year ago

Errors like FIleNotFound that don't specify the file that wasn't found always bother me! So this makes perfect sense. Other than the fact that it should be a ZMQError preserving the errno (i.e. change only the message, no other properties of the exception raised).

monchin commented 1 year ago

Errors like FIleNotFound that don't specify the file that wasn't found always bother me! So this makes perfect sense. Other than the fact that it should be a ZMQError preserving the errno (i.e. change only the message, no other properties of the exception raised).

Thank you for your kind reply! So what if

class Socket(SocketBase, AttributeSetter):
    def bind(self, addr):
        try:
            super().bind(addr)
            ret = self._bind_cm(addr)
        except Exception as e:
            raise type(e)(f"Failed to bind to {addr}"
                f" because of {str(e)}") from e
        return ret

like this?

I have tried to modify it both on win10 and ubuntu 22.04(wsl2), and the result output is

# win10
In [1]: import zmq

In [2]: ctx = zmq.Context()

In [3]: skt1 = ctx.socket(zmq.REP)

In [4]: skt1.bind("tcp://*:9999")
Out[4]: <SocketContext(bind='tcp://*:9999')>

In [5]: skt2 = ctx.socket(zmq.REP)

In [6]: skt2.bind("tcp://*:9999")
---------------------------------------------------------------------------
ZMQError                                  Traceback (most recent call last)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\zmq\sugar\socket.py:215, in Socket.bind(self, addr)
    214 try:
--> 215     super().bind(addr)
    216     ret = self._bind_cm(addr)

File zmq\backend\cython\socket.pyx:540, in zmq.backend.cython.socket.Socket.bind()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\zmq\backend\cython\checkrc.pxd:28, in zmq.backend.cython.checkrc._check_rc()

ZMQError: Address in use

The above exception was the direct cause of the following exception:

ZMQError                                  Traceback (most recent call last)
Cell In [6], line 1
----> 1 skt2.bind("tcp://*:9999")

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\zmq\sugar\socket.py:218, in Socket.bind(self, addr)
    216     ret = self._bind_cm(addr)
    217 except Exception as e:
--> 218     raise type(e)(f"Failed to bind to {addr}"
    219         f" because of {str(e)}") from e
    220 return ret

ZMQError: Failed to bind to tcp://*:9999 because of Address in use
#ubuntu 22.04(wsl2)
In [1]: import zmq

In [2]: ctx = zmq.Context()

In [3]: skt1 = ctx.socket(zmq.REP)

In [4]: skt1.bind("tcp://*:9999")
Out[4]: <SocketContext(bind='tcp://*:9999')>

In [5]: skt2 = ctx.socket(zmq.REP)

In [6]: skt2.bind("tcp://*:9999")
---------------------------------------------------------------------------
ZMQError                                  Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/zmq/sugar/socket.py:302, in Socket.bind(self, addr)
    301 try:
--> 302     super().bind(addr)
    303     ret = self._bind_cm(addr)

File zmq/backend/cython/socket.pyx:564, in zmq.backend.cython.socket.Socket.bind()

File ~/.local/lib/python3.10/site-packages/zmq/backend/cython/checkrc.pxd:28, in zmq.backend.cython.checkrc._check_rc()

ZMQError: Address already in use

The above exception was the direct cause of the following exception:

ZMQError                                  Traceback (most recent call last)
Cell In[6], line 1
----> 1 skt2.bind("tcp://*:9999")

File ~/.local/lib/python3.10/site-packages/zmq/sugar/socket.py:305, in Socket.bind(self, addr)
    303     ret = self._bind_cm(addr)
    304 except Exception as e:
--> 305     raise type(e)(f"Failed to bind to {addr}"
    306         f" because of {str(e)}") from e
    307 return ret

ZMQError: Failed to bind to tcp://*:9999 because of Address already in use
minrk commented 1 year ago

I think maybe even simpler: append to the existing exception object's message attribute and then re-raise it instead of instantiating a new instance.

monchin commented 1 year ago

I think maybe even simpler: append to the existing exception object's message attribute and then re-raise it instead of instantiating a new instance.

I see. So maybe I need to modify socket.pyx

    def bind(self, addr):
        cdef int rc
        cdef char* c_addr

        _check_closed(self)
        addr_b = addr
        if isinstance(addr, str):
            addr_b = addr.encode('utf-8')
        elif isinstance(addr_b, bytes):
            addr = addr_b.decode('utf-8')

        if not isinstance(addr_b, bytes):
            raise TypeError('expected str, got: %r' % addr)
        c_addr = addr_b
        rc = zmq_bind(self.handle, c_addr)
        if rc != 0:
            if IPC_PATH_MAX_LEN and zmq_errno() == ENAMETOOLONG:
                path = addr.split('://', 1)[-1]
                msg = ('ipc path "{0}" is longer than {1} '
                                'characters (sizeof(sockaddr_un.sun_path)). '
                                'zmq.IPC_PATH_MAX_LEN constant can be used '
                                'to check addr length (if it is defined).'
                                .format(path, IPC_PATH_MAX_LEN))
                raise ZMQError(msg=msg)
            elif zmq_errno() == ENOENT:
                path = addr.split('://', 1)[-1]
                msg = ('No such file or directory for ipc path "{0}".'.format(
                       path))
                raise ZMQError(msg=msg)
            elif zmq_errno() == EADDRINUSE:  # which I add
                path = addr.split('://', 1)[-1]
                msg = 'Address in use for ipc path "{0}".'.format(path)
                raise ZMQError(msg=msg)
        while True:
            try:
                _check_rc(rc)
            except InterruptedSystemCall:
                rc = zmq_bind(self.handle, c_addr)
                continue
            else:
                break

but after python setup.py build_ext, I failed to import with hint ImportError: cannot import name '_device' from partially initialized module 'zmq.backend.cython' (most likely due to a circular import) (C:\pyzmq\zmq\backend\cython\__init__.py)

I don't have enough time to deal with it now, I'll try to test it in the near future.

minrk commented 1 year ago

I think you were right to put it in zmq.sugar.socket. I would do the very simplest thing and append to the strerror attribute of the ZMQError and re-raise it:

...
except ZMQError as e:
    # add address to the error message before raising
    e.strerror += f" (addr={addr!r})"
    raise

The spurious circular import message (I don't understand why Python thinks that's a good hint, but it's wrong) is likely caused by missing Cython files. You can do a full dev install with:

pip install -e .
monchin commented 1 year ago

@minrk Hello, sorry for late reply. I have solved the problem (I tried python setup.py build_ext without --inplace so it failed) and tested my code above on win10, and the result is

# win10
In [1]: import zmq

In [2]: ctx = zmq.Context()

In [3]: skt1 = ctx.socket(zmq.REP)

In [4]: skt1.bind("tcp://*:9999")
Out[4]: <SocketContext(bind='tcp://*:9999')>

In [5]: skt2 = ctx.socket(zmq.REP)

In [6]: skt2.bind("tcp://*:9999")
---------------------------------------------------------------------------
ZMQError                                  Traceback (most recent call last)
Cell In [6], line 1
----> 1 skt2.bind("tcp://*:9999")

File C:\Project\git\pyzmq\zmq\sugar\socket.py:301, in Socket.bind(self, addr)
    278 def bind(self: T, addr: str) -> _SocketContext[T]:
    279     """s.bind(addr)
    280
    281     Bind the socket to an address.
   (...)
    299
    300     """
--> 301     super().bind(addr)
    302     return self._bind_cm(addr)

File C:\Project\git\pyzmq\zmq\backend\cython\socket.pyx:565, in zmq.backend.cython.socket.Socket.bind()
    563         path = addr.split('://', 1)[-1]
    564         msg = 'Address in use for ipc path "{0}".'.format(path)
--> 565         raise ZMQError(msg=msg)
    566 while True:
    567     try:

ZMQError: Address in use for ipc path "*:9999".

I tried your code above, and that's really simple and we can get what we want by your code.

---------------------------------------------------------------------------
ZMQError                                  Traceback (most recent call last)
Cell In [6], line 1
----> 1 skt2.bind("tcp://*:9999")

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\zmq\sugar\socket.py:302, in Socket.bind(self, addr)
    279 """s.bind(addr)
    280
    281 Bind the socket to an address.
   (...)
    299
    300 """
    301 try:
--> 302     super().bind(addr)
    303     return self._bind_cm(addr)
    304 except ZMQError as e:
    305     # add address to the error message before raising

File zmq\backend\cython\socket.pyx:564, in zmq.backend.cython.socket.Socket.bind()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\zmq\backend\cython\checkrc.pxd:28, in zmq.backend.cython.checkrc._check_rc()

ZMQError: Address in use (addr='tcp://*:9999')

I didn't know this way before, so thank you for making me know that. And I think you should push your code if you think this issue is worth to merge.