nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.71k stars 151 forks source link

Can't connect to container started with dind #285

Closed luksfarris closed 3 years ago

luksfarris commented 3 years ago

Hi, thank you for this project, it allowed us to run unprivileged Gitlab docker executors. Although services work perfectly, containers started detached with docker run don't seem to be accessible, and this is important for our tests that use testcontainers. Here's a minimal example of the problem:

docker_gitlab_test:
  stage: test
  image: docker:19.03.13
  tags:
    - docker # this is our docker executor
  services:
    - docker:19.03.12-dind
  script:
    - apk add curl
    - mkdir test
    - echo "hello world" > test/index.html
    - docker run -d -p 8080:80 -v $(pwd)/test:/usr/share/nginx/html:ro --hostname nginx --name nginx nginx
    - curl localhost:8080
  variables:
    DOCKER_DRIVER: overlay2
    DOCKER_HOST: tcp://docker:2375
    DOCKER_TLS_CERTDIR: ""

It fails with curl: (7) Failed to connect to localhost port 8080: Connection refused. We have a setup very similar to this one from your blog running on Ubuntu 20 (sysbox version v0.2.1). Is this behavior expected? Do you know how we can achieve this?

Thank you so much for any information or direction. I'll gladly help provide any information that helps debug this.

ctalledo commented 3 years ago

Hi Lucas,

Thanks for giving Sysbox a shot, glad it's helping your team.

I think the problem you are seeing is due to the networking setup rather than to Sysbox per-se.

The nginx container is running inside the docker:19.03.12-dind service container (i.e., nested). Though that inner container maps the "host's" port 8080 -> container port 80, in this case the "host" is the outer service container (docker:19.03.12-dind), and this container does not expose port 8080 to the real host. As a result you can't reach the inner container's port 80 from the real host.

Furthermore, the curl localhost:8080 is executed from within the docker client container (i.e., docker:19.03.13). Since "localhost" is itself, it can't reach the other container.

In theory, the solution is to launch the service container such that it exposes port 8080 to the real host, and have the client container access port 8080 on the real host. I say in theory because I am not sure GitLab gives you the option to do that.

ctalledo commented 3 years ago

One way to reproduce the problem without GitLab, it to mimic the GitLab setup with something like this:

$ docker network create some-network

Launch the "service" container:

$ docker run --runtime=sysbox-runc \
    --name dind-syscont -d \
    --network some-network --network-alias docker \
    -e DOCKER_TLS_CERTDIR=/certs \
    -v dind-syscont-certs-ca:/certs/ca \
    -v dind-syscont-certs-client:/certs/client \
    docker:dind

Launch the "client" container:

$ docker run -it --rm \
    --network some-network \
    -e DOCKER_TLS_CERTDIR=/certs \
    -v dind-syscont-certs-client:/certs/client:ro \
    docker:latest sh
/ #

This allows you to play around with exposing ports, etc., to see what setups work and which don't.

ctalledo commented 3 years ago

By the way, I am thinking you could solve this in GitLab by not using the "service" container, and instead creating a container image that includes both the docker client and docker daemon in it. This way, both would run within the same (sysbox) container, and thus "curl localhost:8080" would work.

bsideup commented 3 years ago

FTR it is possible to override the host Testcontainers uses to connect to containers. It looks like localhost is detected, but docker should be used. In this case, you can try setting TESTCONTAINERS_HOST_OVERRIDE=docker to test it.

luksfarris commented 3 years ago

Thank very much you Cesar and Sergei for your comments.

I think the problem you are seeing is due to the networking setup rather than to Sysbox per-se.

You may be correct, I thought it could be related to sysbox because it worked well with privileged docker executors. Though I was wrong (as you pointed out correctly) in accessing it via localhost. Docker makes them accessible on 0.0.0.0. I need to read more on docker networks to try and figure this out.

I am thinking you could solve this in GitLab by not using the "service" container, and instead creating a container image that includes both the docker client and docker daemon in it

I will give this a try!

FTR it is possible to override the host Testcontainers uses to connect to containers.

So using the dind service, testcontainers-python correctly identifies the docker host, and is able to successfully create the containers. The problem I am facing is accessing those containers

luksfarris commented 3 years ago

Ok, after tons of failed pipelines I managed to find a workaround. Thanks for all the leads @ctalledo. I don't think this was a sysbox issue, I actually think there's a firewall somewhere blocking it.

I will post my full config here, so maybe it will help someone else that stumbles upon this issue in the future. Our goal was to run a Python + Postgres + Kafka test on the Gitlab CI that's configured with a (sysbox runtime) Docker executor.

Here's the Gitlab pipeline:

unit_test:
  stage: test
  image: python:3.7 # might be improved by using an image with Python + docker + docker-compose
  tags:
    - docker # the docker executor
  services:
    - docker:19.03.12-dind
  variables:
    DOCKER_DRIVER: overlay2
    DOCKER_HOST: tcp://docker:2375
    DOCKER_TLS_CERTDIR: ""
  script:
    - apt-get update
    - apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
    - curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
    - echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
    - apt-get update
    - apt-get install -y docker-ce docker-ce-cli containerd.io
    - curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
    - chmod +x /usr/local/bin/docker-compose
    - ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY # if you want to start containers from the gitlab registry
    - docker-compose -f docker-compose.gitlab.yml up -d
    - sleep 30 # wait for all the services to start
    - curl docker:9000/broker/1 # at this point all services are available at docker:[PORT]
    - pip install ... && pytest ... # unit test execution 

Btw: I found out that I had to move all my containers declared in the gitlab services to the docker-compose file, because I could not get them to communicate with the docker host.

The way I made the containers work was using the host network driver, and using localhost to bridge between services, like in this docker-compose file:

version: '3.5'

services:
  db:
    network_mode: host
    image: timescale/timescaledb:latest-pg11
    environment:
      POSTGRES_DB: postgres
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: "password"
      POSTGRES_HOST_AUTH_METHOD: trust

  zookeeper:
    image: zookeeper:3.4.9
    network_mode: host
    environment:
      ZOO_MY_ID: 1
      ZOO_PORT: 2181
      ZOO_SERVERS: server.1=localhost:2888:3888

  kafka1:
    image: confluentinc/cp-kafka:5.3.0
    network_mode: host
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://localhost:19091,LISTENER_DOCKER_EXTERNAL://docker:9091
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "localhost:2181"
      KAFKA_BROKER_ID: 1
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    depends_on:
      - zookeeper

  kafdrop:
    image: obsidiandynamics/kafdrop
    restart: "no"
    network_mode: host
    environment:
      KAFKA_BROKERCONNECT: "localhost:19091"
    depends_on:
      - kafka1

Finally, from the python code I could connect to the DB on host=docker port=5432 user=postgres password=password database=postgres and produce messages with:

KafkaProducer(bootstrap_servers=["docker:9091"], value_serializer=..., api_version=(2, 3, 0))
ctalledo commented 3 years ago

Thanks @luksfarris; so it sounds like you fixed it by running Docker compose inside the "docker:19.03.12-dind" container (the latter deployed with Sysbox). The Docker compose launches the 4 inner containers and these communicate with each other via a host network that lives fully inside the Sysbox container.

Makes sense, but please correct me if I am wrong.

Thanks again for using Sysbox, glad you got this working!

luksfarris commented 3 years ago

Dear Cesar @ctalledo sorry for the late reply. I've tried to organize my thoughts into a diagram, I hope it helps explain the problem I was having and how it was addressed.

GitlabCI(1)

Some test use cases for this:

ctalledo commented 3 years ago

Hi Lucas (@luksfarris),

Thanks for creating the diagram, very useful.

One question remains in my mind: which of these containers, if any, are Sysbox containers?

luksfarris commented 3 years ago

Hi Cesar, sorry it took me so long to get back to you. The only container in this scenario that has the sysbox runtime is the first one, the gitlab executor that starts all the jobs.

ctalledo commented 3 years ago

Thanks @luksfarris.

So this is interesting, because normally you want to use Sysbox to run containers that have the Docker Daemon inside (i.e., the docker:19.03-dind containers). Otherwise, if you don't use Sysbox, they would need to run as "privileged" containers, which is not secure (provides weak isolation between the container and the host).

Just FYI in case you want to improve your setup further.