hobbit-project / platform

HOBBIT benchmarking platform
GNU General Public License v2.0
23 stars 9 forks source link

Experiment termination leads to a flood of messages #535

Open MichaelRoeder opened 2 years ago

MichaelRoeder commented 2 years ago

Description

A huge amount of log messages is generated when an experiment is forced to terminate by the platform controller. This leads to several drawbacks:

  1. It is harder to find the cause of the termination in the logs because it is filled with a lot of exceptions (e.g., RabbitMQ exceptions); even the controller reports errors that it caused by its own behavior
  2. The ELK stack is forced to process all these log messages.

The majority of these messages could be avoided by changing the behavior of the platform controller.

Reproducability

Start an experiment. Terminate it. Check the logs.

Expected behavior

The following changes should be implemented:

  1. Mark an experiment as forced to stop. This allows the controller to check whether it makes sense to send additional messages (e.g., the message that a container stopped) to the command queue. The other containers do not have to be informed about the termination since they will be terminated as well.
  2. Terminate containers in the right order. At the moment, it seems like the platform controller starts with the top element of the tree of containers of an experiment. However, in many cases, this container contains the RabbitMQ message broker of the experiment. This leads to a lot of exceptions in all connected containers, which can be easily avoided by changing the order of termination.