API Connection Issue - Githubissues

ambsw-technology commented 7 years ago

(To ensure I'm not running into bugs in the CLI code, I switched back to the API. I also rebuilt the virtualenv from scratch so I didn't have lingering code modifications.)

Now I'm seeing the following (sanitized):

DEBUG:dockermap.map.client:Passing kwargs to client actions: {}
DEBUG:dockermap.map.state.base:Following dependency path for graylog_map.nginx.
DEBUG:dockermap.map.state.base:Dependency path at graylog_map.graylog, instances [None].
DEBUG:docker.auth.auth:Trying paths: ['/home/<domain>/<user>/.docker/config.json', '/home/<domain>/<user>/.dockercfg']
DEBUG:docker.auth.auth:Found file at path: /home/<domain>/<user>/.docker/config.json
DEBUG:docker.auth.auth:Found 'auths' section
DEBUG:docker.auth.auth:Found entry (registry=u'https://index.docker.io/v1/', username=u'<username>')
DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1
DEBUG:paramiko.transport:[chan 19] Max packet in: 32768 bytes
ERROR:paramiko.transport:Secsh channel 19 open FAILED: Connection refused: Connect failed

As part of my testing, I opened another SSH session on the build machine (while the first was failed/frozen) and ran a successful call using the socat port:

$ docker -H localhost:22026 info
Containers: 5
 Running: 0
 Paused: 0
 Stopped: 5
...

I'm not even sure what to check next.

ambsw-technology commented 7 years ago

On the off chance there was a bug in Paramiko, I tried upgrading to 2.x with no effect.

ambsw-technology commented 7 years ago

If I rollback to 0.3.10 and map 0.6.6, I get the following error in the same place:

docker.errors.APIError: 500 Server Error: Internal Server Error ("{"message":"Cannot link to a non running container: /graylog_map.elasticsearch AS /graylog_map.graylog/elasticsearch"}")

Does the docker-fabric start logic recursively check dependencies? Obviously, it's following the requirements from nginx to graylog. However, graylog depends on additional containers that aren't getting started in the right order..

ambsw-technology commented 7 years ago

OK. I'll open a separate issue for the running container issue. I hardcoded the start order and it worked fine on 0.3.10. I upgraded to 0.4.0 and 0.7.0 and the connection issue persists.

ambsw-technology commented 7 years ago

I restored some debug logging and here's what I'm seeing. This connection works:

-------------============= API ==============------------------
with get_connection_args:
{u'local_port': 22025, u'base_url': None, u'remote_port': None}

This connection fails:

-------------============= API ==============------------------
with get_connection_args:
{u'local_port': 22025, u'base_url': 'http://127.0.0.1:22026', u'remote_port': None}

... and the failing configuration is being built in the dependency path:

DEBUG:dockermap.map.state.base:Following dependency path for graylog_map.nginx.
DEBUG:dockermap.map.state.base:Dependency path at graylog_map.graylog, instances [None].
-------------============= API ==============------------------
with get_connection_args:
{u'local_port': 22025, u'base_url': 'http://127.0.0.1:22026', u'remote_port': None}
DEBUG:docker.auth.auth:Trying paths: ['/home/<domain>/<name>/.docker/config.json', '/home/<domain>/<name>/.dockercfg']
DEBUG:docker.auth.auth:Found file at path: /home/<domain>/<name>/.docker/config.json

This is actually the same client constructor code that I chased down two days ago so troubleshooting can resume with my last work on the topic:

OK. Looks like the offender is in dockermap.map.state.base line 116. This client_config is generated as part of the clients in 105 or 107. These are generated by the _policy.

I see a policy created in dockermap.map.client on line 102, but am still trying to chase it through the logic to confirm. Perhaps you know definitely either way (and can shortcut my search).

ambsw-technology commented 7 years ago

So the offending client is being created in the else branch (line 107) with policy.__dict__:

{
    '_f_resolver': <dockermap.map.policy.dep.ContainerDependencyResolver object at 0x7f65a7178910>,
    '_clients': {u'__default__': {'use_host_config': True}}, '_container_names': {}, '_images': {}, 
    '_maps': {'graylog_map': <dockermap.map.container.ContainerMap object at 0x7f65a16dced0>},
    '_r_resolver': <dockermap.map.policy.dep.ContainerDependencyResolver object at 0x7f65a2d75690>}

The only thing here that looks like it could even affect a client is use_host_config: True (do you see something else?). I'm still tracking down where that might cause the host to be populated.

merll commented 7 years ago

~~Fabric still uses Paramiko 1.x , so upgrading might cause additional issues.~~ (The most recent release of Fabric supports 2.x).

I have been looking at several parts of the connection creation process, but I am still not sure where the failing configuration comes in. None of Docker-Fabric, Docker-Map, or docker-py appears to inject a base_url starting with http:. On the contrary, all occurrences are replaced with tcp: in docker-py. Maybe I am misunderstanding some parts of your setup. For reproducing the issue, could you please clarify, what Fabric env variables you are setting (e.g. host_string, docker_local_tunnel_local_port etc.) and how the Docker service is available (e.g. bound to address 127.0.0.1 port 2375 on the SSH host)?

merll commented 7 years ago

I have now managed to reproduce the issue; it is in fact a side-effect. If the base_url is None, docker-py sets it to a value. When you pass the existing client to a container_fabric call (as you described in #13), this derives a DockerClientConfiguration object from that which picks up the changed URL. As this is intended to be possible, I will try to fix the behavior. In the meantime, you can try two ways to circumvent the issue:

One possibility is to simply skip passing a docker_client to container_fabric.

Another way is to create client configurations ahead of time, e.g.

env.docker_clients = {
    '__default__': DockerClientConfiguration(fabric_host='127.0.0.1'),
}
# ...

cc = env.docker_clients['__default__']
docker_client = cc.get_client()
container_fabric_inst = container_fabric(docker_client=cc, container_maps=maps)

ambsw-technology commented 7 years ago

Got it. So here's my question... how does the first client successfully connect? If I understand you correctly, the second client gets its configuration from the first. When the second client tries to use that configuration, it fails. But somehow the first one connects with a bad configuration?

I appreciate that this could sounds counterproductive, but it seems to me like the first connection should fail. That way, the code would be obliged to configure the client in a durably correct way.

merll commented 7 years ago

The first connection uses default arguments, makes its best guess what connection type it is (i.e. socket or TCP) and sets up the appropriate tunnel along with a modified base_url pointing to 127.0.0.1. The second connection picks up the base_url from the existing client instance, but draws the wrong conclusions on the connection type. This occurs both on socket as well as TCP connections.

The base_url modification is somewhat inevitable, since it's the purpose of Docker-Fabric that the user does not have to deal with setting up the URL pointing to the correct tunnel. It seems that ClientConfiguration.from_client should however not use the modified values. I will make some changes there.

merll commented 7 years ago

With Docker-Fabric 0.4.1 I changed the process so that the ClientConfiguration is created first, also for the docker_fabric() function. This avoids side-effects of the internal base_url modification. Now the client instance or client configuration object can also be re-used in container_fabric().

merll / docker-fabric

API Connection Issue #12