merll / docker-fabric

Integration of Docker deployments into Fabric.
MIT License
79 stars 10 forks source link

Start with Recursive Dependencies #13

Closed ambsw-technology closed 7 years ago

ambsw-technology commented 7 years ago

While debugging #12, I rolled back to docker-fabric==0.3.10 and docker-map==0.6.6. I ran into the following issue:

docker.errors.APIError: 500 Server Error: Internal Server Error ("{"message":"Cannot link to a non running container: /graylog_map.elasticsearch AS /graylog_map.graylog/elasticsearch"}")

In this case, I have a two-layer dependency: nginx => graylog => elasticsearch & mongo. When I didn't have nginx it worked correctly. When I manually coded the launch order, it worked correctly. I assume, therefore, that 0.3.10 is not recursively checking dependencies in the start logic.

NOTE: I can't replicate the issue in 0.4.0 until #12 is resolved so it could (at least theoretically) be fixed.

merll commented 7 years ago

Dependency resolution is part of Docker-Map and has always been recursive. However, the consistency check between containers has been improved over time. The changes from 0.6.6 to 0.7.0 fixed some bugs there that were hard to trace, but mostly affected already-running containers. I have had issues before similar to what you are describing. There were actually race conditions and did not show every time. I had the impression this tended to occur more often on dependencies with a particularly long startup time. A fairly simple solution would be adding some startup delay to the dependency. I can add that as a feature to Docker-Map. Checking if the container is running & idle would be more complicated, and potentially not even be reliable.

ambsw-technology commented 7 years ago

I definitely need to check again once I resolve the 0.7.0 connection issue. In 0.6.6 I'm not seeing the right DEBUG messages. Specifically, all it's trying to do is:

There's no effort to start elasticsearch or mongo (the dependencies) first so it doesn't seem to be a race condition. However, I'll close until I can replicate on 0.7.0

merll commented 7 years ago

Even if you cannot reproduce this in 0.7.0, could you post the contents of the ContainerMap (removing potentially sensitive parts)? I could then check if there is something wrong with the dependency resolution, or just the container status check and improve tests as necessary.

ambsw-technology commented 7 years ago

I was travelling all day so I didn't get a chance to work out all the kinks, but I assume the following would be a minimal example:

from dockermap.map.container import ContainerMap
from dockermap.map.input import ContainerLink
from dockerfabric.apiclient import docker_fabric, container_fabric

maps = ContainerMap('test_map', {
    'host_root': '~',
    'u1': {
        'image': 'ubuntu:latest',
    },
    'u2': {
        'image': 'ubuntu:latest',
        'links': [
            ContainerLink("u1", "u1"),
        ],
    },
    'u3': {
        'image': 'ubuntu:latest',
        'links': [
            ContainerLink("u2", "u2"),
        ],
    },
})

docker_client = docker_fabric()
container_fabric_inst = container_fabric(docker_client=docker_client, container_maps=maps)
# Explicitly start u3 first to test multiple dependencies
container_fabric_inst.startup('u3')

When I run this on 0.3.10/0.6.6 I get the following exception:

docker.errors.APIError: 500 Server Error: Internal Server Error ("{"message":"invalid hostname format: test_map.u2"}")

When I run it on 0.4.0/0.7.0, I get the error:

docker.errors.DockerException: Invalid bind address protocol: http+docker://localunixsocket

I'm offline most of tomorrow too. Obviously, I'll figure out how to clean this up if I don't hear that you've gotten past these hiccups.

ambsw-technology commented 7 years ago

Good call in #12. I inherited this code so I didn't realize docker_client was optional. I'm now getting the same invalid hostname on both.

ambsw-technology commented 7 years ago

I'm moving invalid hostname to a separate issue. Obviously, it's blocking progress on replicating the recursive dependency.

ambsw-technology commented 7 years ago

Here's the latest test code resolving #12 and #14:

from dockermap.map.container import ContainerMap
from dockermap.map.input import ContainerLink
from dockerfabric.apiclient import docker_fabric, container_fabric

maps = ContainerMap('test', {
    'host_root': '~',
    'u1': {
        'image': 'ubuntu:latest',
    },
    'u2': {
        'image': 'ubuntu:latest',
        'links': [
            ContainerLink("u1", "u1"),
        ],
    },
    'u3': {
        'image': 'ubuntu:latest',
        'links': [
            ContainerLink("u2", "u2"),
        ],
    },
})

container_fabric_inst = container_fabric(container_maps=maps)
# Explicitly start u3 first to test multiple dependencies
container_fabric_inst.startup('u3')

Now I'm able to reproduce the originally reported issue (in a python shell):

>>> container_fabric_inst.startup('u3')
docker: Creating container 'test.u2' from image 'ubuntu:latest'.
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/dockermap/map/client.py", line 227, in startup
    return self.run_actions('startup', container, instances=instances, map_name=map_name, **kwargs)
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/dockermap/map/client.py", line 113, in run_actions
    results.extend(runner.run_actions(*actions))
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/dockermap/map/runner/__init__.py", line 94, in run_actions
    res = c_method(action_config, container_name, **action.extra_data)
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/dockermap/map/runner/base.py", line 60, in create_instance
    return config.client.create_container(**c_kwargs)
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/dockerfabric/apiclient.py", line 159, in create_container
    return super(DockerFabricClient, self).create_container(image, name=name, **kwargs)
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/docker/api/container.py", line 135, in create_container
    return self.create_container_from_config(config, name)
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/docker/api/container.py", line 146, in create_container_from_config
    return self._result(res, True)
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/docker/client.py", line 178, in _result
    self._raise_for_status(response)
  File "/mnt/data/.virtualenvs/build_deploy/local/lib/python2.7/site-packages/docker/client.py", line 174, in _raise_for_status
    raise errors.APIError(e, response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("{"message":"Could not get container for test.u1"}")

... so it's correctly building u2 before u3 but not recursively checking the dependencies of u2 (to start u1). I'm working around the issue by providing an explicit start order so it's not an urgent issue for me.

merll commented 7 years ago

I have found the problem now. It was not that dependencies were not followed recursively, but they were processed in the wrong order. At some point (probably a while before Docker-Map 0.6.6) code got changed between the dependency resolution and the code that used it, causing this error. As there were only tests for the dependency resolution itself, this went unnoticed for a bit. Tests added in Docker-Map 0.7.0 ignored the order. This will be fixed in the next release of Docker-Map, and I have also added some tests for reproducing the setup.

merll commented 7 years ago

Fixed with Docker-Map release 0.7.2.