Problem showing up in glenlivet if there is a problem binding to a queue and setup_calls starts a scheduler.
In service.py, run method, a service connects to the broker, then runs setup_calls, then the ioloop starts. Problems arise if the ioloop has trouble starting; I was running a test service instance which blocked queue binding until I stopped it, then when the queue was bound the connection had already been closed breaking the timeout on the scheduler. With a single break in the scheduler timeout, services will never reschedule, and we don't have a watchdog.
While running an identically-named instance of a service is the easiest way to block queue binding, it has occasionally happened for other reasons.
Most Project 8 services are not affected by this bug, as many socket connections are exclusive (e.g., laphraoig only allows a single connection on 9221), so EthernetProvider can't connect, and the setup_call will never happen.
Quick solutions:
every call of self.connect() should be followed by self._connection.add_timeout(0, self._do_setup_calls) (as in run, so also in reconnect); not smart, but it should work
instead of doing setup_calls before the ioloop.start, could move it into the ioloop so that it happens every time
move it deeper in so that it occurs after we expect problems, at the end of on_queue_declareok for the requests queue we should be ok, for the alerts queue we've passed all queue bindings
Problem showing up in glenlivet if there is a problem binding to a queue and setup_calls starts a scheduler.
In service.py,
run
method, a service connects to the broker, then runs setup_calls, then the ioloop starts. Problems arise if the ioloop has trouble starting; I was running a test service instance which blocked queue binding until I stopped it, then when the queue was bound the connection had already been closed breaking the timeout on the scheduler. With a single break in the scheduler timeout, services will never reschedule, and we don't have a watchdog.While running an identically-named instance of a service is the easiest way to block queue binding, it has occasionally happened for other reasons.
Most Project 8 services are not affected by this bug, as many socket connections are exclusive (e.g., laphraoig only allows a single connection on 9221), so EthernetProvider can't connect, and the setup_call will never happen.
Quick solutions:
self.connect()
should be followed byself._connection.add_timeout(0, self._do_setup_calls)
(as inrun
, so also inreconnect
); not smart, but it should workioloop.start
, could move it into the ioloop so that it happens every timeon_queue_declareok
for the requests queue we should be ok, for the alerts queue we've passed all queue bindings