Open laroque opened 8 years ago
I'll note that I mark this as urgent because it does represent a recurring cause of confusion for people. It does, however, represent working with the core parts of dripline, so people may hesitate to tackle it. The workaround listed above solves the problem, but it is still "urgent" in the sense that it really needs to be fixed as part of the next non-minor release.
Currently, if the alerts
and requests
exchanges don't exist, a dragonfly serve
will recreate them at some point, but non-fatal errors will be popping for every endpoint (which is get when defined) given in the config file:
2016-09-21T17:59:16[DEBUG ] dripline.core.service(563) -> to sensor_value.waffles sending {'value_cal': 4.0, 'value_raw': '4.0'}
2016-09-21T17:59:16[DEBUG ] dripline.core.service(82) -> Getting credentials
2016-09-21T17:59:16[ERROR ] dripline.core.scheduler(120) -> got a: (404, "NOT_FOUND - no exchange 'alerts' in vhost '/'")
2016-09-21T17:59:16[ERROR ] dripline.core.scheduler(121) -> traceback follows:
Traceback (most recent call last):
File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/scheduler.py", line 114, in _log_a_value
self._conditionally_send(to_send)
File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/scheduler.py", line 103, in _conditionally_send
self.store_value(to_send, severity=self.alert_routing_key)
File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/service.py", line 566, in send_alert
self.send_message(target=severity, message=alert, exchange='alerts', ensure_delivery=False)
File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/service.py", line 491, in send_message
routing_key=result.method.queue,
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 877, in queue_bind
None, replies)
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1141, in _rpc
self._wait_on_response(method_frame))
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1162, in _send_method
self.connection.process_data_events()
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 240, in process_data_events
if self._handle_read():
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 348, in _handle_read
super(BlockingConnection, self)._handle_read()
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 351, in _handle_read
self._on_data_available(data)
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/connection.py", line 1285, in _on_data_available
self._process_frame(frame_value)
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/connection.py", line 1352, in _process_frame
if self._process_callbacks(frame_value):
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/connection.py", line 1322, in _process_callbacks
frame_value) # Args
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/callback.py", line 61, in wrapper
return function(*tuple(args), **kwargs)
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/callback.py", line 92, in wrapper
return function(*args, **kwargs)
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/callback.py", line 232, in process
callback(*args, **keywords)
File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1059, in _on_close
method_frame.method.reply_text)
ChannelClosed: (404, "NOT_FOUND - no exchange 'alerts' in vhost '/'")
This is a bit annoying and has to be fixed. But the changes required need to redefine the order of initialization of the endpoints, channel creation... which is a deep change actually!
Another possibility is to change open_spimescape such as it inherits from Service
and add in the __call
method a Service.__init__
, use Service.run()
and Service.stop()
, so the two exchanges are created.
I don't think that that alternative possibility is a clean solution. You say that there are non-fatal errors, does that mean that we have a workaround currently (even if that is just manually creating the required exchanges)?
The right solution, in an somewhat abstract sense, is that any object which interacts with AMQP (generally Service
or classes derived from it) should always make the proper exchange declarations prior to trying to interact with those exchanges. If we're getting errors about no exchange '<whatever>'
then there is a bug in this logic and we need to re-evaluate the full sequence of actions when an object is created and when the service is started, including the "on success" callbacks pika makes.
This is a rather non-trivial job, so a workaround, even a "bootstrap" script to run on new rabbitMQ instances is probably a reasonable near term solution.
My guess is that these non-fatal errors happen because when starting a service, it will create the object and try to create the associated endpoints (such as waffles and peaches) first and get/set there values. But if the exchanges are not declared before that, these endpoints cannot be set/get, giving an error. The next step is that the Service makes its bindings after declaring/setting the exchanges. And life goes on! This is what I read from the log. Indeed, there is a incoherence in the sequence of actions we are doing: the exchange should be declared at the very beginning, then the endpoints should be defined and get/set and the bindings should be done.
Since this is a non-fatal error, this does not prevent us from starting a daq, but I agree that we should try to solve this shortly because we hate errors.
Indeed a short-term solution would simply to dragonfly monitor
the two exchanges alerts
and requests
. We can put them together into a restart_services
bash script.
From the C++ side, the service
class declares the requests exchange, but doesn't know about the alerts exchange. Of the two classes that inherit from service
, hub
only uses the requests exchange, and relayer
knows about both requests and alerts. This could be changed relatively easily.
As currently implemented, starting a new dripline service does not always ensure that all exchanges are declared prior to binding to them. In particular, on a new rabbitmq server,
dragonfly serve -c <any config>
will not declare the alerts exchange. There is a simple workaround (starting and then killingdragonfly monitor
, which does declare that exchange).The fix should probably come in two steps, the first one sufficient for a while, the second a cleaner design:
Service
such that it always declares both the requests and the alerts exchange, regardless of what binding(s) will eventually happen.Service
should declare both queues and should go ahead and make all of the universal bindings (on alerts, bind to everything; on requests, bind to broadcast.# andService
would also need an overridable method to handle request messages and ignore them. This would be in two steps, one method has the standard on_alert signature and would acknowledge and then decode the message into a dripline.core.Message object which would then be passed to an overridable message to actually implement whatever logic is wanted (probably just a pass inService
).