microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.71k stars 864 forks source link

Run container jobs from dockerized agent #2048

Closed dferretti closed 5 years ago

dferretti commented 5 years ago

We run several dockerized agents on our own server, using an image we built with microsoft/vsts-agent:ubuntu-16.04-docker-18.06.1-ce as a base image. I would like to start using container jobs in our new yaml build definitions - is there any way to do that if the agent itself is running in a container?

Right now when I try, I get this error

Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"/__a/externals/node/bin/node\": stat /__a/externals/node/bin/node: no such file or directory": unknown

I can see from the previous output lines that it tries to create a container with several mounted volumes, including /vsts/agent/externals - but as far as I understand with docker-in-docker you are just writing to the host's /var/run/docker.sock. Works for building images, but for mounted volumes it will look at the host's file system.

Is there any combination of volume mounts I can add to the agent container to get this to work? Or should I just go with installing the agent directly to the host and stop using the dockerized version?

TingluoHuang commented 5 years ago

@dferretti-fig we don't support running container job when the agent already inside a container. what's your agent version? the agent should error out and tells you that you are inside a container.

dferretti commented 5 years ago

Thanks for the quick response! The agent is listed in my agent pool as version 2.144.0. I had just re-pulled the base docker image last night so I assume that is what comes installed in that image. If the agent were to auto-update, would the version change in the Capabilities section in AzDO? That's where I found the version number, didn't see version number listed in the logs.

TingluoHuang commented 5 years ago

@dferretti-fig Capabilities in AzDO should shows everything right. can you help me print /proc/1/cgroup by running a regular job with commandline task? we use information in that file detect whether we are inside a container.

dferretti commented 5 years ago

Sure:

9:perf_event:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
8:memory:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
7:hugetlb:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
6:freezer:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
5:devices:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
4:cpuset:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
3:cpuacct:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
2:cpu:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
1:blkio:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698

We are running the agent in ECS - looks like they changed the cgroup naming convention.

dferretti commented 5 years ago

Found some discussion on this: https://github.com/aws/amazon-ecs-agent/issues/1119 They are saying the output from /proc/self/cgroup is not 100% reliable, but looks like this is not the norm to change it. So from my end, I understand that container jobs from container agents aren't supported, and it was my setup that caused the agent to not warn me of that. Thank you for the info!

astlock commented 5 years ago

@dferretti-fig we don't support running container job when the agent already inside a container. what's your agent version? the agent should error out and tells you that you are inside a container.

Hey @TingluoHuang, Do I understand correctly, that there is no possibility to run containers jobs in agents on a kubernetes cluster?

chrispat commented 5 years ago

@astlock not with the current implementation based on docker. The only way to replicate what a container job does in combination with service containers in kubernetes is really a pod so we would need some other process to create Pods and PVC’s in order to get the same behavior.

astlock commented 5 years ago

@chrispat thank you for the answer. Do you guys have plans to extend/develop some kubernetes native agent in the near future?

astlock commented 5 years ago

Friendly ping @chrispat :)

rLinks234 commented 4 years ago

Is running a container job from within a containerized agent going to be allowed in the future? I can't find anywhere as to why it's not allowed, but it would be very convenient to have this possibility. Microsoft's own Azure DevOps documentation also seems to point to this being an allowed feature.

To try and get around this issue, I tried running the agent through podman (as opposed to docker). I was able to bypass the /proc/1/cgroups hueristic.


In case I can help others who are running into similar issues, here is what I am doing as a workaround:

This gives the following setup:

/etc/systemd/system/vsts.agent.HEBIRobots.Default.{AgentName}.service with contents:

[Unit]
Description=Azure DevOps Agent
Requires=docker.service
After=docker.service

[Service]
Restart=always
ExecStart=/usr/bin/podman run --rm -v /var/run/docker.sock:/var/run/docker.sock:Z -v /azp:/azp:Z -e AZP_URL=https://dev.azure.com/{YourDomain} -e AZP_TOKEN={Secret} -e AZP_AGENT_NAME={AgentName} {YourDockerImage}
ExecStop=/usr/bin/podman stop {YourDockerImage}

[Install]
WantedBy=multi-user.target

This starts the Azure DevOps node on boot (if you boot to a desktop like GNOME/KDE/etc), without having to log in as any user.


Notes/Caveats/Limitations

  1. SELinux does not like my config, so I have set enforcement to permissive (setenforce 0). This has security implications. You can probably do this the right way with audit2allow, et al.
  2. Fedora 31 (and probably newer versions in the future) uses cgroups v2 by default. As of 15 Dec 2019, Docker will not work properly with this. You can tell systemd to use the older cgroups layout. An article with instructions can be found here.
  3. I could not start the Azure DevOps podman container ("entry point not found" error) if I created an empty /azp folder on the host machine without the start.sh in there. I had to manually put the start.sh on my host machine to get the entry point visible to the container.
vlad-m-r commented 4 years ago

Is running a container job from within a containerized agent going to be allowed in the future? I can't find anywhere as to why it's not allowed, but it would be very convenient to have this possibility. Microsoft's own Azure DevOps documentation also seems to point to this being an allowed feature..

When a containerized build is started, Azure Devops makes a series of preparation to steps the make the provided container image compatible for Azure pipelines workloads. You can see those steps in Initialize containers build stage. When a build container is created (docker create ...), the command attaches a bunch of volumes into container to provide necessary toolset (node etc). The problem is when a volume is attached, docker engine uses host filesystem, so when attaching volumes to containers, one needs to make sure the directory path and necessary files exist on the host filesystem. However, since Azure Devops agent is running as a container, docker command is executed from within Azure Devops container and the command attempts to attach directory which does not exist on the host filesystem. As a result, an empty directory is being attached into build container and when it starts (docker start), it fails to find the tools and the whole thing halts.

wyarde commented 4 years ago

@vlad-m-r This problem can be largely overcome by using the docker --volumes-from option (see documtation here). As long the Azure Devops agent container has all the required destinations mounted as volumes, this option can be used by the agent to make those volumes are available in the job container at the same location.

Nirav-Bhadradiya commented 4 years ago

@dferretti-fig we don't support running container job when the agent already inside a container. what's your agent version? the agent should error out and tells you that you are inside a container.

Hey @TingluoHuang, Do I understand correctly, that there is no possibility to run containers jobs in agents on a kubernetes cluster?

I am running my agents on EKS

chrispat commented 4 years ago

If you put your runner in a docker-in-docker container or maybe in a pod that has a docker-in-docker container that is part of it you should be able to use the container features of the agent.

hez2010 commented 3 years ago

Will this be supported in the future? I think it's a quite useful feature, and gitlab runner already supported this.

hez2010 commented 2 years ago

I finally got a solution: docker-in-docker. Instead of mounting host docker.sock, I chose to run docker inside the docker containing azure-pipelines-agent.

For Dockerfile, see https://github.com/hez2010/docker-azure-pipelines-agent. You may also use the docker image built by me: ghcr.io/hez2010/docker-azure-pipelines-agent:main To use it, you need to run the azure-pipelines-agent docker as privileged, and you also need to below set four environment variables:

The first run of container job in each agent will fail for git clone networking issue, but this issue will disappear after 1 retry.

timblaktu commented 2 years ago

@hez2010 I'm looking over your solution but I don't see anywhere you using --volumes-from to pass host paths into the nested docker container. Would you explain the magic piece(s) that work around the docker-in-docker issue? Thanks.

EDIT: I do see now in Azure docs on dockerized agents that:

In order to use Docker from within a Docker container, you bind-mount the Docker socket. Caution Doing this has serious security implications. The code inside the container can now run as root on your Docker host.

Is that the piece that makes this work?

hez2010 commented 2 years ago

Is that the piece that makes this work?

No. I run a docker daemon insider the container instead.

ar-ekta-account commented 2 years ago

@hez2010 using your Dockerfile i wasn't able to run docker-in-docker. Also gave a try with the docker-image you linked. Getting the same default error - "Container feature is not supported ..."

Kamforka commented 7 months ago

So what's the conclusion here? Do dockerized agents support container jobs or not really?

cjproud commented 5 months ago

Keen to also see any updates on this.

ZEB1CLJ commented 3 months ago

I would also like to know about progress on this feature. It would make an Azure DevOps project more portable, since it would have everything (including the agent) containerized.