jasonrbriggs / stomp.py

“stomp.py” is a Python client library for accessing messaging servers (such as ActiveMQ or RabbitMQ) using the STOMP protocol (versions 1.0, 1.1 and 1.2). It can also be run as a standalone, command-line client for testing.
Apache License 2.0
492 stars 167 forks source link

Race condition in heart beats code when reconnecting #308

Closed grundleborg closed 4 years ago

grundleborg commented 4 years ago

Using code which follows the example here: http://jasonrbriggs.github.io/stomp.py/api.html#dealing-with-disconnects and simulating network loss instead by pulling the computer's network cable, sometimes (intermittently) the heart-beat loop does not start back up on reconnection, meaning next time you pull the network cable it stays stuck forever, not realising that the connection has failed.

It seems like a race condition, as it happens reliably for me until I start adding extra logging lines to stomp.py, at which point it becomes much less reproducible. Tested using current dev branch and also 6.0.0 and reproduces the same in both cases.

grundleborg commented 4 years ago

On closer inspection, it looks like the issue is that the reconnecting ends up happening in the heartbeat listener thread, which means once the connection is re-established the heartbeat main-loop shutdown for the old connection races with the new heartbeat thread setup for the new connection. I think the linked documentation would benefit from a note saying "don't actually do this - calling connection.connect() inside the listener is racy and will fail". Not sure if you'd consider the actual behaviour a bug or behaving as designed though?

aragaer commented 4 years ago

It's not exactly "race condition" - I'm having the same issue even if I wait for some timeout in "on_disconnect" method of my listener. But then I don't really know where I should call connection.connect() if not in on_disconnect.

What I see is that heartbeat thread gets the new "CONNECTED" message, recalculates heart-beats but then it gets the "disconnected" event delivered and this causes heartbeat thread to stop.

jasonrbriggs commented 4 years ago

I think it's definitely a bug. This used to work and doesn't any more -- I even had an example in the docs (http://jasonrbriggs.github.io/stomp.py/api.html#dealing-with-disconnects) and that doesn't function any more. Need to look into it further.

jasonrbriggs commented 4 years ago

Hopefully fixed in 6.1.0 - reopen issue if not.