Issue with setting connection timeout lower than heartbeat?

jasonrbriggs / stomp.py

“stomp.py” is a Python client library for accessing messaging servers (such as ActiveMQ or RabbitMQ) using the STOMP protocol (versions 1.0, 1.1 and 1.2). It can also be run as a standalone, command-line client for testing.

Apache License 2.0

495 stars 167 forks source link

Hi,

We've seen some weird behaviour with AmazonMQ where suddenly the connection closes, seemingly when there's incoming subscription data, like this:

DEBUG:stomp.py:socket read error
DEBUG:stomp.py:nothing received, raising CCE
INFO:stomp.py:Receiver loop ended

The socket read error is "The read operation timed out".

I can't be 100% certain, but I feel like the transport doesn't expect socket.read() to time out after an idle period. We have timeout set to 10 seconds, but heartbeating at 30, so read() does time out between heartbeats if there's no other traffic, then we get disconnected. For some reason this only seems to happen when connected over TLS, though I can't figure out quite why, except the read() semantics are slightly different.

I think the introduction of socket.settimeout() in https://github.com/jasonrbriggs/stomp.py/issues/55 might've inadvertently affected the read semantics?

A workaround seems to be setting the heartbeat lower than the timeout to prevent the issue manifesting itself, though in normal operation we'd prefer to have the connect timeout quite short. Should the transport potentially unset the global socket timeout after it's successfully connected?

Thanks

I noticed this same problem when running on Amazon EKS (Python 3.8.10) and connecting to Amazon ActiveMQ but could not replicate it on my own computer. Heartbeat send/receive was 15s. With timeout 10s I saw this problem. With 60s timeout I did not notice anything.

Not sure what is really happening there. Transport.receive() is raising InterruptedException if the socket.recv(...) returns either EAGAIN or EINTR error. This exception is caught and ignored in the transport.__read(). If this was an issue with socket.recv getting interrupted due to the timeout, I would not expect to see those log messages nbertram posted.

Also the Python documentation for socket.recv says "Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an InterruptedError exception (see PEP 475 for the rationale)."

jasonrbriggs / stomp.py

Issue with setting connection timeout lower than heartbeat? #366