signalfx / maestro-ng

Orchestration of Docker-based, multi-host environments
https://signalfx.com
Apache License 2.0
685 stars 83 forks source link

unable to obtain status of data volume container on start with docker engine 1.4.1 #151

Closed iangkent closed 9 years ago

iangkent commented 9 years ago

I have a data volume container defined which is used by another container on same ship. I followed instructions in maestro-ng docs and source.

http://maestro-ng.readthedocs.org/en/latest/#volume-bindings https://github.com/signalfuse/maestro-ng/blob/master/tests/yaml/test_volumes.yaml

When I run maestro start I see two issue. First the container that depends on data container starts first. Second maestro is unable to determine status of data container when running maestro start.

I am using maestro-ng 0.2.6.2 with docker engine 1.4.1. When I run the same yaml against ship running docker engine 1.8.1 it works fine.

Here is a sample yaml file to reproduce the issue:

__maestro:
  schema: 2

name: DEV-TEST

ships:

  dm01:
    ip:  ********
    docker_port: 4243

services:

  db-data:
    image: busybox:latest
    lifecycle:
    instances:
      db1-data:
    ship: dm01
    container_volumes:
          - /var/lib/db/data

  db:
    image: busybox:latest
    instances:
      db1:
        ship: dm01
    volumes_from: db1-data
    command: /bin/sh -c "while true; do echo db running; sleep 5; done"

  as:
    image: busybox:latest
    requires: [ db ]
    instances:
      as1:
        ship: dm01
    command: /bin/sh -c "while true; do echo as running; sleep 5; done"

Here is the exception that I am getting:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/maestro/__main__.py", line 173, in execute
    getattr(c, options.command)(**vars(options))
  File "/usr/lib/python2.7/site-packages/maestro/maestro.py", line 299, in start
    auditor=self.auditor).run()
  File "/usr/lib/python2.7/site-packages/maestro/plays/__init__.py", line 148, in run
    self._end()
  File "/usr/lib/python2.7/site-packages/maestro/plays/__init__.py", line 140, in _end
    exceptions.raise_with_tb(self._error)
  File "/usr/lib/python2.7/site-packages/maestro/plays/__init__.py", line 97, in act
    task.run(auditor=self._auditor)
  File "/usr/lib/python2.7/site-packages/maestro/plays/tasks.py", line 91, in run
    exceptions.raise_with_tb()
  File "/usr/lib/python2.7/site-packages/maestro/plays/tasks.py", line 85, in run
    self._run()
  File "/usr/lib/python2.7/site-packages/maestro/plays/tasks.py", line 160, in _run
    '\n'.join(error).strip())
OrchestrationException: Halting start sequence because db1-data (on dm01) failed to start!
mpetazzoni commented 9 years ago

You're right about the data container not being started before the container that uses it. I think Maestro should consider the usage of volumes_from as a dependency between those two services to ensure the proper start order.

I'm not sure what you meant by it works fine when using Docker 1.8.1 though, as I would expect this to fail the same way regardless of the Docker version that you use. Unless you got lucky, since the start order could be arbitrary when no dependencies between services are defined.

iangkent commented 9 years ago

I did some more investigation and found that it is a timing issue when checking status of a data container running on remote machine. When maestro-ng checks for running container state it works when maestro is running on same machine as docker engine hosting data container. However, when maestro is running on machine remote to engine hosting data container it fails to obtain status of container after start.

https://github.com/signalfuse/maestro-ng/blob/master/maestro/plays/tasks.py#L240

This appears to be a time/latency issue. If I add a sleep before check in tasks.py it fails on local as well.

        self.o.pending('waiting for initialization...')
        time.sleep(0.05)

        def check_running(x):

Maestro should check just for created state of data container as data containers do not run process.

This is related to issue: https://github.com/signalfuse/maestro-ng/issues/139

Maestro needs to distinguish between service/daemon container running process and data container simply containing shared volume.

The workaround is to run a dummy process in data container.

  db-data:
    image: busybox:latest
    instances:
      db1-data:
        ship: dm01
        stop_timeout: 0
        container_volumes:
          - /var/lib/db/data
        command: /bin/sh -c "while true; do echo fake running; sleep 600; done"
mpetazzoni commented 9 years ago

Yep, it's a known issue (#139) that Maestro doesn't handle data containers that don't have a running process. I'll close this as a duplicate, progress will be tracked in #139. Thanks for reporting this!