Closed wcpettus closed 7 years ago
Maybe this is in that output but I'm just going to ask:
Latter, the last two lines of the first code block orient you to where in service.py
the logger.warning's are flying:
2016-11-19T22:30:47[WARNING ] dripline.core.service(198) -> Channel 1 was closed: (405) RESOURCE_LOCKED - cannot obtain exclusive access to locked queue 'status_multido' in vhost '/'
2016-11-19T22:30:47[WARNING ] dripline.core.service(133) -> Connection closed, reopening in 5 seconds: (0) Not specified
So line 133 is in method on_connection_closed
, which has this block:
self._channel = None
if self._closing:
self._connection.ioloop.stop()
else:
logger.warning('Connection closed, reopening in 5 seconds: (%s) %s',
reply_code, reply_text)
self._connection.add_timeout(5, self.reconnect)
which is why we get in this recursion loop where we keep spawning deeper levels of reconnect. I'm not sure when else this block may get called.
Supervisord defaults to startretries=3
if we choose that option.
This issue was moved to project8/dripline-python#9
This is a case study in what happens when an identical service tries to start when it is already running somewhere else. It happened this weekend because the multido (status_multigets.yaml) service was running on my account to patch dead endpoints, and Mathieu tried to restart all the services to roll back releases without killing my instance.
Everything starts fine, but then service can't use the top-level name "status_multido" because it is already bound to a running service:
the logs then quickly get boring as they repeat every 5 sec:
and they continue doing this until stuff start failing:
at this point we are so deep in a recursion loop it takes 2000 lines to unwrap it, and the service crashes:
This isn't an issue, per se, it's really a discussion starting point. Automatically trying to restart is a good thing, and eventually crashing is the right behavior. But maybe it shouldn't take 20 minutes and the 240th connection attempt to crash?