hobbit-project / platform

HOBBIT benchmarking platform
GNU General Public License v2.0
23 stars 9 forks source link

Orphaned containers because of experiment termination during docker pull #39

Open MichaelRoeder opened 7 years ago

MichaelRoeder commented 7 years ago

Problem

It is possible that an experiment is terminated (in the example, the benchmark controller exits in line 984 and the platform stops the experiment in line 1097), while one of the benchmark or system containers requested the creation of a new container. If this container creation takes more time (e.g., the platform pulls the image of a container as it does in the example with the tenforce/virtuoso) the container is created and started while the stopping of all experiment containers is already finished.

This leads to an orphaned container that consumes resources and is not bound to an existing experiment (in the example it is the container 84d8cd52c91c496dd40d4cff95f78a1877c755107519c35236d07c60c51bb551).

platform controller log of the example

MichaelRoeder commented 7 years ago

Part 1 of the solution

The platform should only accept DOCKER_CONTAINER_START messages, if they have the session of a currently running experiment.

denkv commented 5 years ago

The platform should only accept DOCKER_CONTAINER_START messages, if they have the session of a currently running experiment.

Looks like at least this is already done in https://github.com/hobbit-project/platform/commit/e1cf3511a95a0fc31f971ea51f728ba2ec6f902d (not in master branch yet).