Closed luksfarris closed 3 years ago
Hi Lucas,
Thanks for giving Sysbox a shot, glad it's helping your team.
I think the problem you are seeing is due to the networking setup rather than to Sysbox per-se.
The nginx container is running inside the docker:19.03.12-dind
service container (i.e., nested). Though that inner container maps the "host's" port 8080 -> container port 80, in this case the "host" is the outer service container (docker:19.03.12-dind
), and this container does not expose port 8080 to the real host. As a result you can't reach the inner container's port 80 from the real host.
Furthermore, the curl localhost:8080
is executed from within the docker client container (i.e., docker:19.03.13
). Since "localhost" is itself, it can't reach the other container.
In theory, the solution is to launch the service container such that it exposes port 8080 to the real host, and have the client container access port 8080 on the real host. I say in theory because I am not sure GitLab gives you the option to do that.
One way to reproduce the problem without GitLab, it to mimic the GitLab setup with something like this:
$ docker network create some-network
Launch the "service" container:
$ docker run --runtime=sysbox-runc \
--name dind-syscont -d \
--network some-network --network-alias docker \
-e DOCKER_TLS_CERTDIR=/certs \
-v dind-syscont-certs-ca:/certs/ca \
-v dind-syscont-certs-client:/certs/client \
docker:dind
Launch the "client" container:
$ docker run -it --rm \
--network some-network \
-e DOCKER_TLS_CERTDIR=/certs \
-v dind-syscont-certs-client:/certs/client:ro \
docker:latest sh
/ #
This allows you to play around with exposing ports, etc., to see what setups work and which don't.
By the way, I am thinking you could solve this in GitLab by not using the "service" container, and instead creating a container image that includes both the docker client and docker daemon in it. This way, both would run within the same (sysbox) container, and thus "curl localhost:8080" would work.
FTR it is possible to override the host Testcontainers uses to connect to containers. It looks like localhost
is detected, but docker
should be used. In this case, you can try setting TESTCONTAINERS_HOST_OVERRIDE=docker
to test it.
Thank very much you Cesar and Sergei for your comments.
I think the problem you are seeing is due to the networking setup rather than to Sysbox per-se.
You may be correct, I thought it could be related to sysbox because it worked well with privileged docker executors. Though I was wrong (as you pointed out correctly) in accessing it via localhost
. Docker makes them accessible on 0.0.0.0
. I need to read more on docker networks to try and figure this out.
I am thinking you could solve this in GitLab by not using the "service" container, and instead creating a container image that includes both the docker client and docker daemon in it
I will give this a try!
FTR it is possible to override the host Testcontainers uses to connect to containers.
So using the dind
service, testcontainers-python
correctly identifies the docker host, and is able to successfully create the containers. The problem I am facing is accessing those containers
Ok, after tons of failed pipelines I managed to find a workaround. Thanks for all the leads @ctalledo. I don't think this was a sysbox issue, I actually think there's a firewall somewhere blocking it.
I will post my full config here, so maybe it will help someone else that stumbles upon this issue in the future. Our goal was to run a Python + Postgres + Kafka test on the Gitlab CI that's configured with a (sysbox runtime) Docker executor.
Here's the Gitlab pipeline:
unit_test:
stage: test
image: python:3.7 # might be improved by using an image with Python + docker + docker-compose
tags:
- docker # the docker executor
services:
- docker:19.03.12-dind
variables:
DOCKER_DRIVER: overlay2
DOCKER_HOST: tcp://docker:2375
DOCKER_TLS_CERTDIR: ""
script:
- apt-get update
- apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
- curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
- echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
- apt-get update
- apt-get install -y docker-ce docker-ce-cli containerd.io
- curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
- chmod +x /usr/local/bin/docker-compose
- ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY # if you want to start containers from the gitlab registry
- docker-compose -f docker-compose.gitlab.yml up -d
- sleep 30 # wait for all the services to start
- curl docker:9000/broker/1 # at this point all services are available at docker:[PORT]
- pip install ... && pytest ... # unit test execution
Btw: I found out that I had to move all my containers declared in the gitlab services
to the docker-compose file, because I could not get them to communicate with the docker host.
The way I made the containers work was using the host
network driver, and using localhost
to bridge between services, like in this docker-compose file:
version: '3.5'
services:
db:
network_mode: host
image: timescale/timescaledb:latest-pg11
environment:
POSTGRES_DB: postgres
POSTGRES_USER: postgres
POSTGRES_PASSWORD: "password"
POSTGRES_HOST_AUTH_METHOD: trust
zookeeper:
image: zookeeper:3.4.9
network_mode: host
environment:
ZOO_MY_ID: 1
ZOO_PORT: 2181
ZOO_SERVERS: server.1=localhost:2888:3888
kafka1:
image: confluentinc/cp-kafka:5.3.0
network_mode: host
environment:
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://localhost:19091,LISTENER_DOCKER_EXTERNAL://docker:9091
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_ZOOKEEPER_CONNECT: "localhost:2181"
KAFKA_BROKER_ID: 1
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
depends_on:
- zookeeper
kafdrop:
image: obsidiandynamics/kafdrop
restart: "no"
network_mode: host
environment:
KAFKA_BROKERCONNECT: "localhost:19091"
depends_on:
- kafka1
Finally, from the python code I could connect to the DB on host=docker port=5432 user=postgres password=password database=postgres
and produce messages with:
KafkaProducer(bootstrap_servers=["docker:9091"], value_serializer=..., api_version=(2, 3, 0))
Thanks @luksfarris; so it sounds like you fixed it by running Docker compose inside the "docker:19.03.12-dind" container (the latter deployed with Sysbox). The Docker compose launches the 4 inner containers and these communicate with each other via a host network that lives fully inside the Sysbox container.
Makes sense, but please correct me if I am wrong.
Thanks again for using Sysbox, glad you got this working!
Dear Cesar @ctalledo sorry for the late reply. I've tried to organize my thoughts into a diagram, I hope it helps explain the problem I was having and how it was addressed.
Some test use cases for this:
Hi Lucas (@luksfarris),
Thanks for creating the diagram, very useful.
One question remains in my mind: which of these containers, if any, are Sysbox containers?
Hi Cesar, sorry it took me so long to get back to you. The only container in this scenario that has the sysbox runtime is the first one, the gitlab executor that starts all the jobs.
Thanks @luksfarris.
So this is interesting, because normally you want to use Sysbox to run containers that have the Docker Daemon inside (i.e., the docker:19.03-dind
containers). Otherwise, if you don't use Sysbox, they would need to run as "privileged" containers, which is not secure (provides weak isolation between the container and the host).
Just FYI in case you want to improve your setup further.
Hi, thank you for this project, it allowed us to run unprivileged Gitlab docker executors. Although
services
work perfectly, containers started detached withdocker run
don't seem to be accessible, and this is important for our tests that usetestcontainers
. Here's a minimal example of the problem:It fails with
curl: (7) Failed to connect to localhost port 8080: Connection refused
. We have a setup very similar to this one from your blog running on Ubuntu 20 (sysbox versionv0.2.1
). Is this behavior expected? Do you know how we can achieve this?Thank you so much for any information or direction. I'll gladly help provide any information that helps debug this.