project8 / dripline

Slow controls for medium scale physics experiments based on AMQP centralized messaging
http://www.project8.org/dripline
1 stars 0 forks source link

Socket timeout #137

Closed nsoblath closed 7 years ago

nsoblath commented 9 years ago

After some period of inactivity, message_monitor crashes with an unhandled socket.timeout exception.

I'm not sure what exactly causes this, because I've seen it happen after different amounts of time.

Here's the traceback:

Traceback (most recent call last):
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/virtualenv/dripline/bin/message_monitor", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/dripline/python/bin/message_monitor", line 79, in <module>
    start_monitoring(**vars(kwargs))
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/dripline/python/bin/message_monitor", line 43, in start_monitoring
    monitor.run()
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/dripline/python/dripline/core/service.py", line 360, in run
    self._connection.ioloop.start()
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/virtualenv/dripline/lib/python2.7/site-packages/pika/adapters/select_connection.py", line 138, in start
    self.poller.start()
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/virtualenv/dripline/lib/python2.7/site-packages/pika/adapters/select_connection.py", line 373, in start
    self.poll()
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/virtualenv/dripline/lib/python2.7/site-packages/pika/adapters/select_connection.py", line 398, in poll
    self._handler(self.fileno, events, write_only=write_only)
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/virtualenv/dripline/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 322, in _handle_events
    self._handle_read()
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/virtualenv/dripline/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 343, in _handle_read
    return self._handle_error(error)
  File "/Users/nsoblath/My_Documents/Project_8/DataAnalysis/virtualenv/dripline/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 269, in _handle_error
    raise socket.timeout
socket.timeout
laroque commented 9 years ago

My suspicion is that this must be coming from a latency issue or something, I'm unable to reproduce locally so I'll leave unassigned and as low_priority. If someone wants to take a look at this, a suggestion follows.

A try/catch in service around like 360 could catch on socket.timeout and attempt a reconnect() and if successful re-call run().

It is worth making sure that if the network is down, the program should actually crash rather than continuing to try and reconnect forever.

nsoblath commented 9 years ago

I think catching the timeout and quitting would be fine. But trying to reconnect once would be added convenience for situations where the network issue is transient.

Mainly I figure that catching the exception and quitting in a nicer way would be good.

guiguem commented 7 years ago

This issue was moved to project8/dripline-python#4