Closed dferretti closed 5 years ago
@dferretti-fig we don't support running container job when the agent already inside a container. what's your agent version? the agent should error out and tells you that you are inside a container.
Thanks for the quick response! The agent is listed in my agent pool as version 2.144.0. I had just re-pulled the base docker image last night so I assume that is what comes installed in that image. If the agent were to auto-update, would the version change in the Capabilities section in AzDO? That's where I found the version number, didn't see version number listed in the logs.
@dferretti-fig Capabilities in AzDO should shows everything right. can you help me print /proc/1/cgroup
by running a regular job with commandline task? we use information in that file detect whether we are inside a container.
Sure:
9:perf_event:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
8:memory:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
7:hugetlb:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
6:freezer:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
5:devices:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
4:cpuset:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
3:cpuacct:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
2:cpu:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
1:blkio:/ecs/6092bfcd-d5a0-46dc-8ae5-0d304565d711/0754a79ad3f47d74928f725bb61b0fb2fcbfe408a212cc2cd484f522f4dc0698
We are running the agent in ECS - looks like they changed the cgroup naming convention.
Found some discussion on this: https://github.com/aws/amazon-ecs-agent/issues/1119
They are saying the output from /proc/self/cgroup
is not 100% reliable, but looks like this is not the norm to change it. So from my end, I understand that container jobs from container agents aren't supported, and it was my setup that caused the agent to not warn me of that.
Thank you for the info!
@dferretti-fig we don't support running container job when the agent already inside a container. what's your agent version? the agent should error out and tells you that you are inside a container.
Hey @TingluoHuang, Do I understand correctly, that there is no possibility to run containers jobs in agents on a kubernetes cluster?
@astlock not with the current implementation based on docker. The only way to replicate what a container job does in combination with service containers in kubernetes is really a pod so we would need some other process to create Pods and PVC’s in order to get the same behavior.
@chrispat thank you for the answer. Do you guys have plans to extend/develop some kubernetes native agent in the near future?
Friendly ping @chrispat :)
Is running a container job from within a containerized agent going to be allowed in the future? I can't find anywhere as to why it's not allowed, but it would be very convenient to have this possibility. Microsoft's own Azure DevOps documentation also seems to point to this being an allowed feature.
To try and get around this issue, I tried running the agent through podman
(as opposed to docker
). I was able to bypass the /proc/1/cgroups
hueristic.
In case I can help others who are running into similar issues, here is what I am doing as a workaround:
podman
container!
start.sh
is not copied into the docker container (Dockerfile)/azp
start.sh
in the /azp
folder-v /azp:/azp:Z
to the args passed to podman
(:Z
is SELinux related - specific to RH/Fedora/SELinux distros.)This gives the following setup:
/etc/systemd/system/vsts.agent.HEBIRobots.Default.{AgentName}.service
with contents:
[Unit]
Description=Azure DevOps Agent
Requires=docker.service
After=docker.service
[Service]
Restart=always
ExecStart=/usr/bin/podman run --rm -v /var/run/docker.sock:/var/run/docker.sock:Z -v /azp:/azp:Z -e AZP_URL=https://dev.azure.com/{YourDomain} -e AZP_TOKEN={Secret} -e AZP_AGENT_NAME={AgentName} {YourDockerImage}
ExecStop=/usr/bin/podman stop {YourDockerImage}
[Install]
WantedBy=multi-user.target
This starts the Azure DevOps node on boot (if you boot to a desktop like GNOME/KDE/etc), without having to log in as any user.
Notes/Caveats/Limitations
setenforce 0
). This has security implications. You can probably do this the right way with audit2allow
, et al.podman
container ("entry point not found" error) if I created an empty /azp
folder on the host machine without the start.sh
in there. I had to manually put the start.sh
on my host machine to get the entry point visible to the container.Is running a container job from within a containerized agent going to be allowed in the future? I can't find anywhere as to why it's not allowed, but it would be very convenient to have this possibility. Microsoft's own Azure DevOps documentation also seems to point to this being an allowed feature..
When a containerized build is started, Azure Devops makes a series of preparation to steps the make the provided container image compatible for Azure pipelines workloads. You can see those steps in Initialize containers build stage. When a build container is created (docker create ...), the command attaches a bunch of volumes into container to provide necessary toolset (node etc). The problem is when a volume is attached, docker engine uses host filesystem, so when attaching volumes to containers, one needs to make sure the directory path and necessary files exist on the host filesystem. However, since Azure Devops agent is running as a container, docker command is executed from within Azure Devops container and the command attempts to attach directory which does not exist on the host filesystem. As a result, an empty directory is being attached into build container and when it starts (docker start), it fails to find the tools and the whole thing halts.
@vlad-m-r This problem can be largely overcome by using the docker --volumes-from
option (see documtation here). As long the Azure Devops agent container has all the required destinations mounted as volumes, this option can be used by the agent to make those volumes are available in the job container at the same location.
@dferretti-fig we don't support running container job when the agent already inside a container. what's your agent version? the agent should error out and tells you that you are inside a container.
Hey @TingluoHuang, Do I understand correctly, that there is no possibility to run containers jobs in agents on a kubernetes cluster?
I am running my agents on EKS
If you put your runner in a docker-in-docker container or maybe in a pod that has a docker-in-docker container that is part of it you should be able to use the container features of the agent.
Will this be supported in the future? I think it's a quite useful feature, and gitlab runner already supported this.
I finally got a solution: docker-in-docker. Instead of mounting host docker.sock
, I chose to run docker inside the docker containing azure-pipelines-agent.
For Dockerfile, see https://github.com/hez2010/docker-azure-pipelines-agent. You may also use the docker image built by me: ghcr.io/hez2010/docker-azure-pipelines-agent:main
To use it, you need to run the azure-pipelines-agent docker as privileged, and you also need to below set four environment variables:
AGENT_PAT
: personal access tokenAGENT_POOL
: target agent pool nameAGENT_URL
: URLAGENT_DOCKER_MTU_VALUE
: making sure it is less than the host docker MTU value, or you'll face networking issueThe first run of container job in each agent will fail for git clone networking issue, but this issue will disappear after 1 retry.
@hez2010 I'm looking over your solution but I don't see anywhere you using --volumes-from
to pass host paths into the nested docker container. Would you explain the magic piece(s) that work around the docker-in-docker issue? Thanks.
EDIT: I do see now in Azure docs on dockerized agents that:
In order to use Docker from within a Docker container, you bind-mount the Docker socket. Caution Doing this has serious security implications. The code inside the container can now run as root on your Docker host.
Is that the piece that makes this work?
Is that the piece that makes this work?
No. I run a docker daemon insider the container instead.
@hez2010 using your Dockerfile i wasn't able to run docker-in-docker. Also gave a try with the docker-image you linked. Getting the same default error - "Container feature is not supported ..."
So what's the conclusion here? Do dockerized agents support container jobs or not really?
Keen to also see any updates on this.
I would also like to know about progress on this feature. It would make an Azure DevOps project more portable, since it would have everything (including the agent) containerized.
We run several dockerized agents on our own server, using an image we built with
microsoft/vsts-agent:ubuntu-16.04-docker-18.06.1-ce
as a base image. I would like to start using container jobs in our new yaml build definitions - is there any way to do that if the agent itself is running in a container?Right now when I try, I get this error
I can see from the previous output lines that it tries to create a container with several mounted volumes, including
/vsts/agent/externals
- but as far as I understand with docker-in-docker you are just writing to the host's/var/run/docker.sock
. Works for building images, but for mounted volumes it will look at the host's file system.Is there any combination of volume mounts I can add to the agent container to get this to work? Or should I just go with installing the agent directly to the host and stop using the dockerized version?