jasonrbriggs / stomp.py

“stomp.py” is a Python client library for accessing messaging servers (such as ActiveMQ or RabbitMQ) using the STOMP protocol (versions 1.0, 1.1 and 1.2). It can also be run as a standalone, command-line client for testing.
Apache License 2.0
495 stars 167 forks source link

Missing/mishandled heartbeats after a few messages #344

Open sobjerke opened 3 years ago

sobjerke commented 3 years ago

consumer.zip

Server: ActiveMQ (5.16.0) with stomp+nio+ssl transport connector Producer: stomp.py 6.1.0 sending timestamped message every 10 seconds Consumer: stomp.py (Tested both 6.1.0 and Dev)

Typical output: 2021-01-21 14:02:04: received message: "test message at 2021-01-21 14:02:04" 2021-01-21 14:02:10: received heartbeat 2021-01-21 14:02:14: received heartbeat 2021-01-21 14:02:15: received message: "test message at 2021-01-21 14:02:14" Heartbeat timeout: diff_receive=6.000141425989568, time=5618406.807629671, lastrec=5618400.807488245 [Errno 0] Error 2021-01-21 14:02:21: stomp disconnected, try to reconnect 2021-01-21 14:02:21: receiver loop ended 2021-01-21 14:02:21: heartbeat timeout 2021-01-21 14:02:25: received message: "test message at 2021-01-21 14:02:25" 2021-01-21 14:02:29: received heartbeat 2021-01-21 14:02:33: received heartbeat 2021-01-21 14:02:43: receiver loop ended 2021-01-21 14:02:43: stomp disconnected, try to reconnect 2021-01-21 14:02:45: received message: "test message at 2021-01-21 14:02:45" 2021-01-21 14:02:51: received heartbeat 2021-01-21 14:02:55: received message: "test message at 2021-01-21 14:02:55" 2021-01-21 14:03:09: receiver loop ended 2021-01-21 14:03:09: stomp disconnected, try to reconnect

Observations:

sobjerke commented 3 years ago

To eliminate an incorrectly configured ActiveMQ server, I replaced the configuration with the one supplied in the downloaded tar-file and changed the python code to not use SSL. When analyzing the tcp traffic, I was able to see that the first packet after a message, was not a heartbeat as I believed when I looked at the SSL traffic.

The first heartbeat from ActiveMQ after a message can take up to twice the heartbeat interval to arrive. It seems there is a bug in ActiveMQ that was reported in 2013 and still not resolved even if several patches has been proposed: https://issues.apache.org/jira/browse/AMQ-4710

I worked around the ActiveMQ issue by setting heart_beat_receive_scale=3 in stomp.Connection.

However it still seems there is an issue with stomp.py. After a reconnect, the consumer's receiver loop ends every 2 to 3 heartbeats.