project8 / dripline-python

python implementation of project8/dripline
Other
2 stars 0 forks source link

esr crashing #30

Closed wcpettus closed 6 years ago

wcpettus commented 6 years ago

Every time the esr takes data, the service has crashed and restarted at the end of the run. I assume something is getting dropped at the service level because the error is from on_connection_closed: Connection closed, reopening in 5 seconds: (0) Not specified

Possibly related to the excessively long nature of the timeout (7+ minutes)?

wcpettus commented 6 years ago

The root of this problem sits at the rabbitmq-pika interface.

When upgrading from Debian Jessie to Stretch, we bumped the system RabbitMQ version from 3.3.5 to 3.6.6. This changed the default "heartbeat" interval from 580 seconds to 60 seconds, miss two heartbeats and the connection is automatically closed. Simple enough to reproduce; create a connection, sleep for three minutes, and you can watch the connection disappear from the web UI of rabbit.

There should be two ways to fix this:

Done.

Sidenote: +1 for the dripline standard addressing long-lasting blocking calls; +1 for async and python3 to make this go away