zmqless / python-zeroless

ZeroMQ for Pythonistas™
http://python-zeroless.readthedocs.io
GNU Lesser General Public License v2.1
142 stars 13 forks source link

Recover / Reconnect / Restart Server #37

Open fire17 opened 2 months ago

fire17 commented 2 months ago

Hi there, First of all let me say thanks a lot I've been using this library for a few years and i love it for the most part

The only thing is that I cant seem to recover from errors If something has happened during the client/server conversation I have to restart all of the Servers and all of the Clients

 ::: NEW QUERY REQUEST ON ROUTER SERVER PIPELINE: Local RID: 6 ::: Query: tell a joke :::
Traceback (most recent call last):
  File "/Users/magic/wholesomegarden/magicllight/magicllight/core/airouter/pipelines/xo_benedict/freshServer.py", line 69, in listen
    for payload in listen_for_request:
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/zeroless/zeroless.py", line 60, in _recv
    frames = sock.recv_multipart()
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/zmq/sugar/socket.py", line 806, in recv_multipart
    parts = [self.recv(flags, copy=copy, track=track)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "_zmq.py", line 1137, in zmq.backend.cython._zmq.Socket.recv
  File "_zmq.py", line 1172, in zmq.backend.cython._zmq.Socket.recv
  File "_zmq.py", line 1264, in zmq.backend.cython._zmq._recv_copy
  File "_zmq.py", line 1259, in zmq.backend.cython._zmq._recv_copy
  File "_zmq.py", line 160, in zmq.backend.cython._zmq._check_rc
zmq.error.ZMQError: Operation cannot be accomplished in current state
.............. server crashed, needs recovering.........

but no matter what i do, i cant seem to recover [ALL CLIENTS AND SERVERS MUST DIE TO RESTART] I want a simple recovery without closing everything

Again this library is amazing, But the server/clients connections must be more robust, and auto handle reconnecting

Please let me know what you think, and how we can solve this Thanks a lot and all the best!

fire17 commented 2 months ago

THE SOLUTION IM LOOKING FOR

reply, listen_for_request = server.reply()

try:
    for payload in listen_for_request:
        res = process_payload(payload)
        reply(res)
except:
    traceback.print_exc()
    # recover[0] = True
    print(".............. recovering from zmq error .........")

    server.reconnect()           # <-------------- THIS NEEDS TO BE INCLUDED IN THE ZEROLESS LIBRARY
    # This should:
    #   1. restart the server, avoid the "already exists on port error" 
    #   2. Handle the stuck client, by either:
    #       a. sending the stuck request a FAILED message, and let the client handle it
    #       b. recalling the function, simple recovery
    #       c. returning what the failed method already processed, advanced recovery (BETTER SOLUTION)
    #           Explanation: The ZMQ error happens at the end (on reply) which means that the server function
already ran and returned results. So right before sending via reply, that data should be temporarily saved,
and if the reply failed, it will be used after server recovery to be return immediately to the client. 

    print(".............. recovering done .........")