jasonrbriggs / stomp.py

“stomp.py” is a Python client library for accessing messaging servers (such as ActiveMQ or RabbitMQ) using the STOMP protocol (versions 1.0, 1.1 and 1.2). It can also be run as a standalone, command-line client for testing.
Apache License 2.0
495 stars 167 forks source link

Heartbeats cause on_disconnected to be called multiple times after reconnecting #372

Closed richard3d closed 2 years ago

richard3d commented 2 years ago

We are using stomp.py 7.0 in a project. Recently we have had issues with the broker disconnecting our client app (on_disconnected was not being called) and so we added heartbeats=(5000, 0) param to our connections. The client app is just comprised of three listeners. The on_disconnected handler now gets called if the connection remains idle for too long. Inside of the handler we just call a connect_and_subscribe method that looks like this:

def connect_and_subscribe(conn, destination, id, username, password):
    """
    See: http://jasonrbriggs.github.io/stomp.py/api.html?highlight=heartbeat#dealing-with-disconnects

    Reusable code block to handle connections and subscriptions for a queue.
    """
    conn.connect(username, password, wait=True)
    conn.subscribe(destination=destination, id=id, ack='auto')
    logger.info(f"Connected to {destination} with listener id {id}")

The odd thing is that after reconnecting the connection only lasts as long as the client heartbeat (5 secs) and then on_disconnected is triggered and the app reconnects etc. This behavior repeats for the duration of the client app (when it is not actively receiving messages). Nothing really breaks on our end, except our logs are quite verbose and we would like to understand the root cause of this behavior since it seems unnatural.

richard3d commented 2 years ago

After doing some more testing and research I believe I have found the issue - to reproduce: Create a connection with heartbeats=(5000,0) add a listener to this connection and make sure the listener has the on_disconnected method. The method should call connect_and_subscribe as detailed in the original comment above. Run the app in debug mode and pause it for more than 5 seconds. You will notice that on_disconnected is called for the listener first (which then attempts to reconnect on the same thread), however if you put a break point at HeartbeatListener's on_disconnected method you will see it is called afterwards in a separate thread. This terminates the heartbeat loop and thus causes the broker to disconnect which in turn starts the whole thing over.

richard3d commented 2 years ago

Looking into it even further this seems to be the cause for our case: https://github.com/jasonrbriggs/stomp.py/blob/dev/stomp/transport.py#L248

The sorting causes the internal heartbeat listener disconnect to be called after our listener's disconnect (which tries to reconnect) hence the heartbeat loop terminates after we try to reconnect, which ultimately causes the ping-pong between connecting and disconnecting. Does anyone know the purpose of this sort? Why are listeners processed in this order?

jasonrbriggs commented 2 years ago

The sorting was added because some consumers had a need for ordered message processing. i.e. message enrichment that is applied in layers.

richard3d commented 2 years ago

Thanks @jasonrbriggs, I saw https://github.com/jasonrbriggs/stomp.py/issues/212 and the associated PR today but it still left me wondering a bit...I see I can control order with the name and that fixes our issue. Previously our listeners' names had alphabetical precedence to ('protocol-listener', stomp.connect.StompConnection11), but shouldn't the protocol-listener always be the first to get handled/processed regardless? I may be ignorant to some use-cases out there but to me it seems like not having it that way creates opportunity for a user (like me) to really dork things up :)

jasonrbriggs commented 2 years ago

Yes it probably should be higher up the prio. Still should be a way to insert something before the protocol-listener itself, but agreed that it shouldn't be as easy to mess it up just by adding your own listeners with a name starting with a lower alphabetical letter...

richard3d commented 2 years ago

Maybe the behavior is fine to leave as is, if it's mentioned/highlighted in the docs. If you have an idea of where the info would best be placed (I notice a lot of examples use an empty string for the param to set_listener) I would be more than happy to contribute/help.