Open nbertram opened 3 years ago
I noticed this same problem when running on Amazon EKS (Python 3.8.10) and connecting to Amazon ActiveMQ but could not replicate it on my own computer. Heartbeat send/receive was 15s. With timeout 10s I saw this problem. With 60s timeout I did not notice anything.
Not sure what is really happening there. Transport.receive() is raising InterruptedException if the socket.recv(...) returns either EAGAIN or EINTR error. This exception is caught and ignored in the transport.__read(). If this was an issue with socket.recv getting interrupted due to the timeout, I would not expect to see those log messages nbertram posted.
Also the Python documentation for socket.recv says "Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an InterruptedError exception (see PEP 475 for the rationale)."
Hi,
We've seen some weird behaviour with AmazonMQ where suddenly the connection closes, seemingly when there's incoming subscription data, like this:
The socket read error is "The read operation timed out".
I can't be 100% certain, but I feel like the transport doesn't expect
socket.read()
to time out after an idle period. We have timeout set to 10 seconds, but heartbeating at 30, so read() does time out between heartbeats if there's no other traffic, then we get disconnected. For some reason this only seems to happen when connected over TLS, though I can't figure out quite why, except theread()
semantics are slightly different.I think the introduction of
socket.settimeout()
in https://github.com/jasonrbriggs/stomp.py/issues/55 might've inadvertently affected the read semantics?A workaround seems to be setting the heartbeat lower than the timeout to prevent the issue manifesting itself, though in normal operation we'd prefer to have the connect timeout quite short. Should the transport potentially unset the global socket timeout after it's successfully connected?
Thanks