Containers not able to connect to each other using exposed ports in a network

shadjiiski commented 8 years ago

I am trying to get a simple Wordpress application running while making use of the networking feature. Basically I create a network, then create a mysql container, then create a wordpress container and point it to connect to the previously created mysql container. See Code section for details.

VIC Version: JFrog Bintray builds 4677, 4715

Code

# Create a network:
docker --tls network create -d bridge wpnet

# The create a mysql database. Note that the port 3306 is mapped to 3306 host port. This is done
# in order to use a single mysql container for 2 different (both failing) approaches to get the
# wordpress application running
docker --tls run -itd --name mysql -e MYSQL_ROOT_PASSWORD=pass@word01 --network=wpnet -p 3306:3306 mariadb:10.0.26

# Then create wordpress containers

# Approach 1 - use the docker host IP ($IP) and the mapped host port (3306)
docker --tls run -itd --name wordpress1 -e WORDPRESS_DB_PASSWORD=pass@word01 -e WORDPRESS_DB_HOST=$IP:3306 --network=wpnet -p 8081:80 wordpress:4.3.1

# Approach 2 - use the IP that the mysql container was assigned in the wpnet network
# (172.18.0.2, got this with docker network insect wpnet) and the port on which mysql is
# listening inside the container (3306 again)
docker --tls run -itd --name wordpress2 -e WORDPRESS_DB_PASSWORD=pass@word01 -e WORDPRESS_DB_HOST=172.18.0.2:3306 --network=wpnet -p 8082:80 wordpress:4.3.1

Expected behavior: Wordpress container successfully connecting to mysql container in both cases and showing the installation wizard in web browser when accessed on $IP:8081 and $IP:8082 respectively.

Actual behavior:

Approach 1 - Wordpress fails to connect to mysql container and thus refuses HTTP connections

$ docker --tls logs wordpress1
WordPress not found in /var/www/html - copying now...
WARNING: /var/www/html is not empty - press Ctrl+C now if this is an error!
+ ls -A
.wh..wh..opq  index.html
+ sleep 10
Complete! WordPress has been successfully copied to /var/www/html

Warning: mysqli::mysqli(): (HY000/2002): Connection refused in - on line 10

MySQL Connection Error: (2002) Connection refused

Warning: mysqli::mysqli(): (HY000/2002): Connection refused in - on line 10

MySQL Connection Error: (2002) Connection refused

Warning: mysqli::mysqli(): (HY000/2002): Connection refused in - on line 10

MySQL Connection Error: (2002) Connection refused

Approach 2 - Apache server in wordpress container returns a configuration error, thus wordpress is refusing HTTP connections once again:

$ docker --tls logs wordpress2
WordPress not found in /var/www/html - copying now...
WARNING: /var/www/html is not empty - press Ctrl+C now if this is an error!
+ ls -A
.wh..wh..opq  index.html
+ sleep 10
Complete! WordPress has been successfully copied to /var/www/html
[Sat Sep 10 11:00:21.626517 2016] [core:warn] [pid 200:tid 139777220556672] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Sat Sep 10 11:00:21.651821 2016] [core:warn] [pid 200:tid 139777220556672] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
[Sat Sep 10 11:00:21.651868 2016] [core:warn] [pid 200:tid 139777220556672] AH00111: Config variable ${APACHE_LOG_DIR} is not defined
AH00534: apache2: Configuration error: More than one MPM loaded.

Note: Initially, I tried to set the application "the right way", i.e., I was passing -e WORDPRESS_DB_HOST=mysql:3306 when starting the wordpress container. However, it looks like the names of the containers are not resolved currently (see #2294), so I fell back to these approaches.

hmahmood commented 8 years ago

Approach 1 should not work because containers cannot connect to the VCH using its "external" IP address. Additionally only connections coming in on the VCH's client interface are accepted for port mapping, so no container can use the port mapping via the VCH's ip address.

As for Approach 2, I am not sure if this is a port mapping issue. It looks like an environment variable APACHE_LOG_DIR is not defined; shouldn't that be passed in to docker run via the -e option?

hmahmood commented 8 years ago

It appears that we are not applying at least one of the layers in the image that enables only one mpm module. We are loading both mpm_event and mpm_prefork in VIC, while in regular docker only mpm_prefork gets loaded. This appears to be a bug in our processing of the image layers.

Specifically the command in the layer is

/bin/sh -c a2dismod mpm_event && a2enmod mpm_prefork

shadjiiski commented 8 years ago

@hmahmood

Approach 1 should not work because containers cannot connect to the VCH using its "external" IP address. Additionally only connections coming in on the VCH's client interface are accepted for port mapping, so no container can use the port mapping via the VCH's ip address.

While I agree that this is an unusual way to setup an application I don't think that it should not work. First of all, this actually works on regular docker, so chance is people will have trouble migrating from docker to VIC. Furthermore, it makes no sense to fear containers more than people. What I mean is that if a container could be connected from anyone who has access to that "external" network (the one that is used to connect to the docker API service), why won't it be possible to use the same means of connection from within another container?

As for Approach 2, I am not sure if this is a port mapping issue. It looks like an environment variable APACHE_LOG_DIR is not defined; shouldn't that be passed in to docker run via the -e option?

Judging by your second comment, you have already verified this, but no, there shouldn't be any additional configuration passed to the wordpress container. Both approaches work on regular docker and both yield working wordpress installations that can be accessed on $DOCKER_HOST_IP:8081 and $DOCKER_HOST_IP:8082 respectively.

mdubya66 commented 8 years ago

Required by Admiral

hmahmood commented 8 years ago

@shadjiiski OK that makes sense, and I think it is an easy enough change to VIC to allow this.

Approach 2 though is not a networking issue; it is the way we are processing the image layers. Somehow at least one layer that enables only one apache mpm module is not applied to the container.

hickeng commented 8 years ago

Bumping to to P0 only because there isn't a higher priority. Unless I've made a basic mistake somewhere, we are not currently supporting deletion of files from layers:

Dockerfile:

FROM busybox

# create a directory and symlink
RUN mkdir -p /home/testdir && ln -s /home/testdir /home/dirlink

# create a file and symlink
RUN echo hello > /home/testfile && ln -s /home/testfile /home/filelink

# remove the links
RUN rm /home/dirlink /home/filelink

# test file deletion
RUN rm /home/testfile && rmdir /home/testdir

# another layer just in case
RUN echo world > /home/extra

Expected output (from regular docker):

vagrant@devbox:~/issues/2295$ docker run -it localhost:5000/2295-test
/ # cd home
/home # ls -l
total 4
-rw-r--r--    1 root     root             6 Oct  5 15:12 extra
/home #

Failing output (from vic):

docker -H 192.168.78.132:2376 --tls run -it 192.168.78.215:5000/2295-test
/ # cd home
/home # ls -l
total 12
lrwxrwxrwx    1 root     root            13 Oct  5  2016 dirlink -> /home/testdir
-rw-r--r--    1 root     root             6 Oct  5  2016 extra
lrwxrwxrwx    1 root     root            14 Oct  5  2016 filelink -> /home/testfile
drwxr-xr-x    2 root     root          4096 Oct  5  2016 testdir
-rw-r--r--    1 root     root             6 Oct  5  2016 testfile
/home #

Workflow:

vagrant@devbox:~/issues/2295$ docker build -t localhost:5000/2295-test .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM busybox
 ---> 2b8fd9751c4c
Step 2 : RUN mkdir -p /home/testdir && ln -s /home/testdir /home/dirlink
 ---> Using cache
 ---> 9cbe8ea9b110
Step 3 : RUN echo hello > /home/testfile && ln -s /home/testfile /home/filelink
 ---> Using cache
 ---> be2c027083f6
Step 4 : RUN rm /home/dirlink /home/filelink
 ---> Using cache
 ---> d040587b2544
Step 5 : RUN rm /home/testfile && rmdir /home/testdir
 ---> Running in 7a35cfdd13ad
 ---> 8ab0f43d69f8
Removing intermediate container 7a35cfdd13ad
Step 6 : RUN echo world > /home/extra
 ---> Running in 29f2d6e0700f
 ---> 8a990039980e
Removing intermediate container 29f2d6e0700f
Successfully built 8a990039980e
vagrant@devbox:~/issues/2295$ docker push localhost:5000/2295-test
The push refers to a repository [localhost:5000/2295-test]
575a08cecddf: Pushed
8549ba721f0d: Pushed
237ea3eaf63f: Pushed
9dd8de6b590e: Pushed
0f72232577f9: Pushed
8ac8bfaff55a: Pushed
latest: digest: sha256:c026aa8c5cba43b4b013364d36625576b0030ff485ba6589d56ede67ddf6fd4c size: 1562
vagrant@devbox:~/issues/2295$ docker -H 192.168.78.132:2376 --tls pull 192.168.78.215:5000/2295-test
Using default tag: latest
Pulling from 2295-test
8ddc19f16526: Pull complete
a3ed95caeb02: Pull complete
157ffd9c4b8d: Pull complete
f685649134bf: Pull complete
e279ebb81ef7: Pull complete
b826007c5162: Pull complete
0207c8c6b2d4: Pull complete
Digest: sha256:338ddc59583da9820650d65a005c5f9595b8c720fd3dfa060ea460417a5d0762
Status: Downloaded newer image for 2295-test:latest
vagrant@devbox:~/issues/2295$ docker -H 192.168.78.132:2376 --tls run -it 192.168.78.215:5000/2295-test
/ # cd home
/home # ls -l
total 12
lrwxrwxrwx    1 root     root            13 Oct  5  2016 dirlink -> /home/testdir
-rw-r--r--    1 root     root             6 Oct  5  2016 extra
lrwxrwxrwx    1 root     root            14 Oct  5  2016 filelink -> /home/testfile
drwxr-xr-x    2 root     root          4096 Oct  5  2016 testdir
-rw-r--r--    1 root     root             6 Oct  5  2016 testfile

@fdawg4l for initial comment

fdawg4l commented 8 years ago

@hickeng wow, that's really bad. Will look at this asap. We're using docker's tar mechanism with thier whiteout support. We're likely not using it correctly. I'll take a look.

Is the result of your dockerfile somewhere I can pull it?

hickeng commented 8 years ago

@fdawg4l sorry no, it's all in a local nested env.

I'm running a registry docker run -d -p 5000:5000 --restart=always --name registry registry:2 in my devbox.

Only extra step needed is to reconfigure docker daemon for insecure registry so you can push it (assuming systemd):

echo "
[Service]
Environment=INSECURE_REGISTRY=--insecure-registry=localhost:5000
" > /etc/systemd/system/docker.service.d/insecure.conf
systemctl daemon-reload
systemctl restart docker

We're going to need a regression test for this - we could use a modified form of that Dockerfile that asserts the contents of /home as it's CMD. Longer term my preference would be to use docker export so we can directly compare the end result as tar archives so we don't have to comprehensively list all variants of file operations. This would let us use complex hub images as the basis.

hmahmood commented 8 years ago

@fdawg4l you can use the apache example above.

fdawg4l commented 8 years ago

Simple fix. We were using the wrong archive call.

Verifying the solution and will post a PR asap.

fdawg4l commented 8 years ago

On VIC

/home # ls -l
total 4
-rw-r--r--    1 root     root             6 Oct  5 17:49 extra
/home #

Looks like that's all that was needed. Pushing PR now.

hmahmood commented 8 years ago

There still one network related item that remains.

fdawg4l commented 8 years ago

Weird. I removed the "Fixed" verbiage, but I guess GH ignored that.

fdawg4l commented 8 years ago

Oh, it was still in my commit msg. Apologies.

stuclem commented 8 years ago

This is fixed. Removing kind/note.

vmware / vic

Containers not able to connect to each other using exposed ports in a network #2295