project8 / dripline

Slow controls for medium scale physics experiments based on AMQP centralized messaging
http://www.project8.org/dripline
1 stars 0 forks source link

All exchange declarations should happen every time a service starts #160

Open laroque opened 8 years ago

laroque commented 8 years ago

As currently implemented, starting a new dripline service does not always ensure that all exchanges are declared prior to binding to them. In particular, on a new rabbitmq server, dragonfly serve -c <any config> will not declare the alerts exchange. There is a simple workaround (starting and then killing dragonfly monitor, which does declare that exchange).

The fix should probably come in two steps, the first one sufficient for a while, the second a cleaner design:

  1. Modify Service such that it always declares both the requests and the alerts exchange, regardless of what binding(s) will eventually happen.
  2. What should really happen is that Service should declare both queues and should go ahead and make all of the universal bindings (on alerts, bind to everything; on requests, bind to broadcast.# and .#). The service setup subcommand could then create further bindings for associated endpoints if/as needed. For this to work, Service would also need an overridable method to handle request messages and ignore them. This would be in two steps, one method has the standard on_alert signature and would acknowledge and then decode the message into a dripline.core.Message object which would then be passed to an overridable message to actually implement whatever logic is wanted (probably just a pass in Service).
laroque commented 8 years ago

I'll note that I mark this as urgent because it does represent a recurring cause of confusion for people. It does, however, represent working with the core parts of dripline, so people may hesitate to tackle it. The workaround listed above solves the problem, but it is still "urgent" in the sense that it really needs to be fixed as part of the next non-minor release.

guiguem commented 8 years ago

Currently, if the alerts and requests exchanges don't exist, a dragonfly serve will recreate them at some point, but non-fatal errors will be popping for every endpoint (which is get when defined) given in the config file:

2016-09-21T17:59:16[DEBUG   ] dripline.core.service(563) -> to sensor_value.waffles sending {'value_cal': 4.0, 'value_raw': '4.0'}
2016-09-21T17:59:16[DEBUG   ] dripline.core.service(82) -> Getting credentials
2016-09-21T17:59:16[ERROR   ] dripline.core.scheduler(120) -> got a: (404, "NOT_FOUND - no exchange 'alerts' in vhost '/'")
2016-09-21T17:59:16[ERROR   ] dripline.core.scheduler(121) -> traceback follows:
Traceback (most recent call last):
  File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/scheduler.py", line 114, in _log_a_value
    self._conditionally_send(to_send)
  File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/scheduler.py", line 103, in _conditionally_send
    self.store_value(to_send, severity=self.alert_routing_key)
  File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/service.py", line 566, in send_alert
    self.send_message(target=severity, message=alert, exchange='alerts', ensure_delivery=False)
  File "/Users/guig098/Work/Project8/SourceTree/dripline/python/dripline/core/service.py", line 491, in send_message
    routing_key=result.method.queue,
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 877, in queue_bind
    None, replies)
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1141, in _rpc
    self._wait_on_response(method_frame))
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1162, in _send_method
    self.connection.process_data_events()
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 240, in process_data_events
    if self._handle_read():
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 348, in _handle_read
    super(BlockingConnection, self)._handle_read()
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 351, in _handle_read
    self._on_data_available(data)
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/connection.py", line 1285, in _on_data_available
    self._process_frame(frame_value)
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/connection.py", line 1352, in _process_frame
    if self._process_callbacks(frame_value):
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/connection.py", line 1322, in _process_callbacks
    frame_value)                 # Args
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/callback.py", line 61, in wrapper
    return function(*tuple(args), **kwargs)
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/callback.py", line 92, in wrapper
    return function(*args, **kwargs)
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/callback.py", line 232, in process
    callback(*args, **keywords)
  File "/Users/guig098/Work/Project8/python_environments/dragonfly/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1059, in _on_close
    method_frame.method.reply_text)
ChannelClosed: (404, "NOT_FOUND - no exchange 'alerts' in vhost '/'")

This is a bit annoying and has to be fixed. But the changes required need to redefine the order of initialization of the endpoints, channel creation... which is a deep change actually!

guiguem commented 8 years ago

Another possibility is to change open_spimescape such as it inherits from Service and add in the __call method a Service.__init__, use Service.run() and Service.stop(), so the two exchanges are created.

laroque commented 8 years ago

I don't think that that alternative possibility is a clean solution. You say that there are non-fatal errors, does that mean that we have a workaround currently (even if that is just manually creating the required exchanges)?

The right solution, in an somewhat abstract sense, is that any object which interacts with AMQP (generally Service or classes derived from it) should always make the proper exchange declarations prior to trying to interact with those exchanges. If we're getting errors about no exchange '<whatever>' then there is a bug in this logic and we need to re-evaluate the full sequence of actions when an object is created and when the service is started, including the "on success" callbacks pika makes.

This is a rather non-trivial job, so a workaround, even a "bootstrap" script to run on new rabbitMQ instances is probably a reasonable near term solution.

guiguem commented 8 years ago

My guess is that these non-fatal errors happen because when starting a service, it will create the object and try to create the associated endpoints (such as waffles and peaches) first and get/set there values. But if the exchanges are not declared before that, these endpoints cannot be set/get, giving an error. The next step is that the Service makes its bindings after declaring/setting the exchanges. And life goes on! This is what I read from the log. Indeed, there is a incoherence in the sequence of actions we are doing: the exchange should be declared at the very beginning, then the endpoints should be defined and get/set and the bindings should be done.

Since this is a non-fatal error, this does not prevent us from starting a daq, but I agree that we should try to solve this shortly because we hate errors. Indeed a short-term solution would simply to dragonfly monitor the two exchanges alerts and requests. We can put them together into a restart_services bash script.

nsoblath commented 8 years ago

From the C++ side, the service class declares the requests exchange, but doesn't know about the alerts exchange. Of the two classes that inherit from service, hub only uses the requests exchange, and relayer knows about both requests and alerts. This could be changed relatively easily.