microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.72k stars 866 forks source link

Can't acquire root on common container distros #2043

Open vtbassmatt opened 5 years ago

vtbassmatt commented 5 years ago

Agent Version and Platform

Version of your agent? 2.x series

OS of the machine running the agent? Linux

Azure DevOps Type and Version

any Azure DevOps account

What's not working?

(copied from docs repo: https://github.com/MicrosoftDocs/vsts-docs/issues/2939) - reported by @njsmith: The example here demonstrates using the container: feature with the ubuntu:16.04 image. Which is great! This is exactly what I want to do, though with ubuntu:18.10 to test my software on the latest versions of everything (in particular openssl 1.1.1).

And the container: feature is pretty slick: it goes to a lot of trouble to map things into the container in a clever way, and set up a non-root user to run as, while granting that user sudo permissions, etc.

But... the standard images maintained by dockerhub, like ubuntu:16.04 and ubuntu:18.10 or debian:testing, don't have sudo installed. Which means that if you use them with container:, you actually cannot get root inside the container. It's impossible.

I guess the container: feature is useful for folks who are already maintaining some kind of development-environment images for their own use, but this makes it a complete non-starter for my use case, where I just want to use pipelines normally, but test on a different distro. I guess in theory I could maintain my own image that is just the official ubuntu:18.10 + sudo installed, but there's no way maintaining an image like that is worth it for this.

Instead I've had to give on up using container: and am instead writing things like:

      - bash: |
          set -ex
          sudo docker run -v $PWD:/t ubuntu:rolling /bin/bash -c "set -ex; cd /t; apt update; apt install -y python3.7-dev python3-virtualenv git build-essential; python3.7 -m virtualenv -p python3.7 venv; source venv/bin/activate; source ci/ci.sh"

This is workable, but it's really a shame to lose all the slick container: features just because of this.

It would be really nice if the container: feature could some make sure it was possible to get root inside the container. For example, there could be a config key to request running as root, either for the whole job or just for a specific script or bash task. Or the container setup phase could mount in a volume containing a suid sudo or gosu. Or anything, really...


The LinuxBrew folks are also facing a similar challenge. See https://github.com/Linuxbrew/brew/issues/746#issuecomment-452873130

sjackman commented 5 years ago

In our case the default user of the container is linuxbrew with UID 1000. The Docker image does have sudo installed, and the linxubrew user has passwordless access to sudo. Pipelines attempts to run useradd -m -u 1001 vsts_azpcontainer, which fails because the linuxbrew user does not have permission to run useradd. Running sudo useradd -m -u 1001 vsts_azpcontainer would succeed. I suggest Pipelines to run sudo useradd -m -u 1001 vsts_azpcontainer if the current user is non-root and /usr/bin/sudo exists, and otherwise run useradd -m -u 1001 vsts_azpcontainer.

sjackman commented 5 years ago

Most Docker containers have a default user of root. Our Linuxbrew/brew container is a bit unusual in that regard, that the default user is linuxbrew UID=1000. We do this because Homebrew refuses to run as root.

vtbassmatt commented 5 years ago

@TingluoHuang thoughts on how we handle this? I wonder if we could try without sudo (as we currently do), and if that fails, try with sudo?

njsmith commented 5 years ago

There are really two issues here, that are mostly unrelated.

For the problem the LinuxBrew folks are hitting, where the agent initialization is assuming that plain docker exec will have root, I think the solution is just for the agent initialization code to use docker exec -u 0:0 to override any USER directive in the dockerfile. Docker has root to start off with; there's no point in going root -> some other user -> sudo back to root.

For the problem I'm having, where there's no way for my user code to get root without sudo, the best solution I can think of is to add a way to mark particular tasks as being executed as root. Then it would be the agent's job to make this happen. For example, it might use sudo when running in a regular environment, and docker exec -u 0:0 or some other user-switching helper when running in an arbitrary container. Usage might look like:

- bash: "apt update && apt install -y some package"
  runAsRoot: true
iMichka commented 5 years ago

On the other hand, why is azure-pipelines even trying to execute anything inside the running container? I think other CI providers do not do anything like that. There could also be an option to disable that "feature"?

vtbassmatt commented 5 years ago

The feature is explicitly for running build steps inside a container. To do that, we need to make sure the container can read the files mapped into the workspace and, crucially, that the agent can also read anything created by the build step.

There could also be an option to disable that "feature"?

That exists - we don't make anyone use the feature :grin:

sjackman commented 5 years ago

Good point. We're running our fist task that creates the artifacts in the Linxubrew/brew Docker image. @vtbassmatt Once the artifacts are created, can we run the task: PublishBuildArtifacts@1 in a different image? And if so, which image do you recommend? See our usage of task: PublishBuildArtifacts@1 in for example https://github.com/Linuxbrew/homebrew-extra/pull/46/files

vtbassmatt commented 5 years ago

We run the whole job in one container. We investigated doing per-step container support, and even had a working proof of concept at one point, but didn't pursue finishing it.

sjackman commented 5 years ago

Thanks for the explanation. In that case, we do need to run PublishBuildArtifacts@1 in our Linuxbrew/brew Docker image that's being used to create the artifacts.

danwalmsley commented 5 years ago

I also am unable to use this feature because im trying to use debootstrap which can only run as root.

https://unix.stackexchange.com/questions/214828/is-it-possible-to-run-debootstrap-within-a-fakeroot-environment

anyone found a workaround?

njsmith commented 5 years ago

@danwalmsley I'm guessing that any container that's full-featured enough to have debootstrap is also full-featured enough to have sudo :-). So the workaround is to run sudo debootstrap. This issue is specifically about containers that are missing sudo, and you can't install it, because to install sudo you need to use sudo...

danwalmsley commented 5 years ago

@njsmith you were right, I was able to use sudo and it worked. thanks

esteve commented 5 years ago

@danwalmsley if it's of any use, I managed to install sudo by running docker exec -u 0 inside the container:

https://github.com/ApexAI/performance_test/blob/master/azure-pipelines.yml#L9-L17

Containers in Azure are configured so that you can run Docker inside them, so I just exported the Docker executable as a volume and then access the running container as root via docker exec. The only requirement is to name the container (by passing --name NAME in options), so you can access it via docker exec. The other thing is to not overwrite the sudo config files that the Azure agent generates, but I think it'd be better if the agent wrote them separately to a file /etc/sudoers.d/ instead of /etc/sudoers

fatherlinux commented 5 years ago

Red Hat might consider creating a custom version of Universal Base Image which comes pre configured for CI/CD pipelines with sudo installed. We are tracking it here, but don't have any plans yet.

Today, I would recommend people to build a small layered image (no matter which Linux distro), tuck it in quay.io or dockerhub, and pull from there. Maintaining a small layered image shouldn't be that difficult.

Also, is it possible to have the first step in Azure Pipelines just install sudo (I am assuming not). Sorry, I have never used Azure Pipelines and don't have time to test, but I am the product manager for UBI, so I find this use case interesting from a base image perspective.

To add a hair more background, there is constant tension when building base images (no matter which Linux distro). If we (Red Hat UBI team in this case, but same goes for any base image maintainer/architect) add more packages for every use case on the planet, then base images will eventually be 1GB+. Partners, customers, and cloud providers all need "special" things in their base images, and this use case is so similar.

njsmith commented 5 years ago

Today, I would recommend people to build a small layered image (no matter which Linux distro), tuck it in quay.io or dockerhub, and pull from there.

This is probably where we'll end up eventually if this isn't fixed, but having to create a separate repo and maintain and update custom containers is a lot of extra operational complexity for your average open source project, compared to just writing a few lines of yaml... Suddenly we have to figure out credential handling for our container repository, etc.

Also, is it possible to have the first step in Azure Pipelines just install sudo (I am assuming not).

Azure pipelines already starts by injecting some custom software into the image (a copy of nodejs that it uses to run the pipelines agent, which lets it run further commands inside the agent). If they injected a copy of sudo as well, in some well-known location, that would pretty much solve this for any container image.

pombredanne commented 5 years ago

@njsmith after some trial an errors, I got a sudo installed in some containers this way: https://dev.azure.com/nexB/license-expression/_build/results?buildId=79 https://github.com/nexB/license-expression/blob/3fe3f9359c34b6e6e31e6b3454e450ca8e9e9d6e/azure-pipelines.yml#L80

This is incredibly hackish as it involves running first a command line docker as root that runs docker in docker to install sudo? (or something along these lines). Somehow it works and I am able to get sudo-less containers (such as the official Ubuntu, Debian and CentOS images) to gain sudo access such that I can then install a Python interpreter and eventually run the tests at last.

It looks like this has been first crafted by @esteve for https://github.com/ApexAI/performance_test/blame/6ae8375fa1e3111cb6fa60bdd1d42b9b9f370372/azure-pipelines.yml#L11

There are variations in https://github.com/quamotion/lz4.nativebinaries/blob/4030ff9d97259b05df84c080d494971b62931363/azure-pipelines.yml#L77 and https://github.com/dlemstra/Magick.Native/blob/3c83b2e7d06ded8f052bb5b282c5a79e27b2d6b7/azure-pipelines.yml

alanwales commented 5 years ago

Are there any plans to address this issue?

I am trying to run Powershell Core which comes with the docker image 'mcr.microsoft.com/dotnet/core/sdk:3.0.100-preview8-buster' but I am also not able to change permissions on /usr/bin/pwsh to allow execution as the user 'vsts_azpcontainer' has too limited permissions and I can't switch to root.

I see a lot of complex workarounds above by some motivated and creative people, but fundamentally it should be simple to execute container workloads like this. If I pull the image locally and run it then everything works as it should but when I run it on Azure Devops nothing works. Isn't this somehow breaking the docker ideal that "it works on my machine" should disappear?

The suggestion above 'runAsRoot: true' would be great even to have as an option at the container level for people like me who just want to run some simple dotnet core tasks without spending a lot of time working out which chown, chmod, installing sudo or whatever else needs to be done to do this.

pombredanne commented 5 years ago

@alanwales if sudo is not installed in a container, there are not many ways to work around this. The simplest is to use your own images that have sudo pre-installed (but then at least for me since the main use case is to install extra packages, that would be simpler to directly have a custom images with the packages I need).

That said, there is no reason that this should be so complicated here.

alanwales commented 5 years ago

Maybe we have a different interpretation of simple but to quote the original poster of this issue “there's no way maintaining an image like that is worth it for this”. I just want to be able to use the officially provided .Net Core sdk Image as it was designed to be used and how it works in docker natively. Now with yaml pipelines widely available I can imagine this issue will be coming up more often so would be great to prioritize it higher or close with an explanation, thanks

ruffsl commented 5 years ago

Looking at the Initialize containers step, what is the reason for asure to add a new sudo user with the same UID/GID as that of the user running the asure agent process? Is the intent to try and retain file permissions for the _work directory created by the agent before mounting it into the container for the respective job? I'm guessing this is so the agent can always clean the mounted _work directory before the next jobs?

However, if the user running the asure agent process has access to the docker daemon, and is given sudo access inside the container, why couldn't the asure agent just leave the user to exe into the container unchanged from the Dockerfile and instead use sudo chown/rm to persist/clean the agent workspace between jobs instead? Are the azure cloud agents using a rootless docker install, thus the host user may not have sudo? I'm not sure if that kind of setup is permissible, e.g. the agent process being able to mount to docker.sock, but not have sudo access to clean up leftover workspace volumes.

I really this wish this related ticket had some closure, then perhaps this one could be resolved by having the agent target a newer version of docker, or machines with newer linux kernels:

Add ability to mount volume as user other than root #2259 https://github.com/moby/moby/issues/2259

ruffsl commented 5 years ago

I think I found the file that relates to most of the issues here. Looking through the comments also explances a few blocks I explaines earlier today.

https://github.com/microsoft/azure-pipelines-agent/blob/3906428f2f61a869247e070ac1972cc453266994/src/Agent.Worker/ContainerOperationProvider.cs#L45-L47

Ah, yep, the azure agent is not happy about running container jobs while inside a container itself, although I feel like using docker volumes as opposed to host mounts would help avoid having to resolve absolute paths to mount volumes from the host filesystem. Then the azure agent could run as root in it's own container, so it could clean the workspace left from any job without changing the container's Dockerfile default user. I guess the agent might also want to peek into the docker image to determine the default user, so it could prepare the permissions for workspace files so that the default user could use them.

https://github.com/microsoft/azure-pipelines-agent/blob/3906428f2f61a869247e070ac1972cc453266994/src/Agent.Worker/ContainerOperationProvider.cs#L264

Here is where the Temp, Tools, Tasks, and Externals folders are being added as host volumes in the container:

https://github.com/microsoft/azure-pipelines-agent/blob/3906428f2f61a869247e070ac1972cc453266994/src/Agent.Worker/ContainerOperationProvider.cs#L395-L398

I'm not sure docker exec runs as root by default; it's just that the default user for most official images are already root, so many derivative images never change this. I think exec just keeps with user declared by last USER directive in the docker image layer, same as run commands.

ruffsl commented 5 years ago

@TingluoHuang, I guess it's been a while since you added https://github.com/microsoft/azure-pipelines-agent/pull/1005 , but do you think it would be possible to achieve the same workspace setup going on now by first copying the _work workspace from the host to a standalone docker container volume, and then attaching that container volume to the job container, so we can avoid changing the expected default user from the Dockerfile?

https://docs.docker.com/engine/reference/commandline/volume_create/

ruffsl commented 5 years ago

Given the Container Operation Provider tries to reconcile the permissions between the user in the container and the user on the host, a hacky workaround could be to set AGENT_ALLOW_RUNASROOT and launch the agent using the root user on the host. See https://github.com/microsoft/azure-pipelines-agent/pull/1878 , for an example.

Although, if your using a cloud hosted agent, rather than a local agent you could escalate, this doesn't really help when user hosting the agent is vsts (1001:117) on azure cloud.

https://dev.azure.com/ApexAI/performance_test/_build/results?buildId=124&view=logs&jobId=85036c68-25ef-5143-bf73-15187243f4ec&taskId=154fe597-c0fe-4278-8bb4-8aab264c0ef8&lineStart=94&lineEnd=95&colStart=1&colEnd=1

vtbassmatt commented 5 years ago

@jtpetty this is a good one to noodle on as we think about how to evolve container support.

ruffsl commented 5 years ago

@vtbassmatt , it looks like the beta for github actions is also being based on the azure agent runner. Would there be an existing issue or repo to suggest changes to the workspace strategy. I can understand how having a reserved directory path simplifies the file system mounting on the agent backend, but its a bit opinionative/constraining on where users can make stateful changes in the container. I.e. it adds a lot of boilerplate shuttling things back and forth from the workspace to elsewhere in the filestem.

I suppose one could nest logs/builds/caches in the workspace, then symlink to where they are expected in the container filesystem, but that doesn't seem as transparent. It'd be cool to see a similar pattern like with CircleCI, where the uploading assets, submitting test_results, and caching directories can be performed anywhere in the container filesystem, not just reserved to the azure/github workspace folder. https://github.com/microsoft/azure-pipelines-tasks/issues/10870#issuecomment-524692639

vtbassmatt commented 5 years ago

@ruffsl yes, the GitHub runner is based on the Azure Pipelines agent code. It's not a slam-dunk for the agent to back-port runner changes, though. The runner doesn't have to be backwards compatible with existing Azure Pipelines customers. I still hope we can evolve our container support to be a little more industry standard.

mantovani commented 4 years ago

What's the workaround for this ?

sjackman commented 4 years ago

If it's an option for you, you could consider switching to GitHub Actions.

mfkl commented 4 years ago

Could you please provide a workaround for this?

Try to create a user with UID '1001' inside the container.
/usr/bin/docker exec  b76b29190ab25216e4e99fd12ec57375501125c3d19a59c05bffb2d7036e483e bash -c "getent passwd 1001 | cut -d: -f1 "
/usr/bin/docker exec  b76b29190ab25216e4e99fd12ec57375501125c3d19a59c05bffb2d7036e483e useradd -m -u 1001 vsts_azpcontainer
useradd: Permission denied.
useradd: cannot lock /etc/passwd; try again later.
##[error]Docker exec fail with exit code 1

This line should never run by default IMHO, and it should be at the very least configurable. Otherwise you make a whole lot of assumptions on the images of your users, and I don't think that's a good thing.

vtbassmatt commented 4 years ago

@mfkl if we didn't run that line, how would the host access files generated by tasks running in the container? The host is not guaranteed to have root privileges, and some agent services run on the host side.

mfkl commented 4 years ago

how would the host access files generated by tasks running in the container?

Does it really need to for all possible uses cases?

I don't know, I'm not familiar with the azure-pipelines-agent code. But I believe other CI systems don't have this requirement enforced on users images.

vtbassmatt commented 4 years ago

Not for all possible use cases, but important/widespread ones: test results upload, pipeline artifacts, and caching.

Other CI systems definitely work differently than Azure Pipelines in this respect. AFAIK no one else tries to abstract the work (task) from the execution environment (container or VM).

mfkl commented 4 years ago

Since standard, popular docker images simply won't work because of this, it might be a good idea to revisit this design choice IMHO.

vtbassmatt commented 4 years ago

Agreed. We're fully booked on other work for a while but this is something I want to revisit.

edmcman commented 4 years ago

@mfkl if we didn't run that line, how would the host access files generated by tasks running in the container? The host is not guaranteed to have root privileges, and some agent services run on the host side.

Is there a reason why you can't use docker cp?

Alternatively, if docker is running as root but the agent is not, could you not chmod or chown the files using docker exec?

Rjevski commented 4 years ago

Is there any updates on this? What's the currently accepted/recommended workaround for a simple use-case where I need to run an application's tests in a default Ubuntu container?

At the moment I am doing the same workaround as @esteve above with the following container config:

container:
  image: <your image here, ubuntu:latest for example>
  options:  "--name ci-container -v /usr/bin/docker:/tmp/docker:ro"

And then add this as the first step before you do anything else:

- script: |
      /tmp/docker exec -t -u 0 ci-container \
      sh -c "apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -o Dpkg::Options::="--force-confold" -y install sudo"
    displayName: 'Install Sudo in container (thanks Microsoft!)'

This works and subsequent steps can use sudo but it still feels like a terrible hack and something that shouldn't have to be done especially if it breaks the conventions that the majority of Docker Hub images are built around (the assumption that you are already root thus no need for sudo). No other CI service that I'm familiar with requires such workarounds.

xkszltl commented 3 years ago

@Rjevski Welcome to Fedora, where sudo is already there \(^o^)/~

mikeperry-kr commented 3 years ago

I'm using the user namespace feature of Docker in order to run the azure-pipelines-agent as a non-root user. As I understand it, enabling this at the daemon level should help to avoid many of the file permissions issues mentioned in the following comment:

https://github.com/microsoft/azure-pipelines-agent/blob/3906428f2f61a869247e070ac1972cc453266994/src/Agent.Worker/ContainerOperationProvider.cs#L395-L398

With userns-remap enabled on the host docker daemon, I don't believe I need azure-pipelines-agent to create a new user to run as within the container. Running as root within the container should be fine, and any files it leaves behind will be automatically remapped to be owned by non-root host user.

In its current state, if I configure the container resource as:

resources:
  containers:
    - container: pycontainer
       image: python:3.8

then I get an error running a script task within the container that states:

EACCES: permission denied, open '/__w/_temp/.taskkey'

If I pass --user-ns host as an option, like so:

resources:
  containers:
  - container: pycontainer
    image: python:3.8
    options: --userns host # this is necessary because we enable userns-remap on the host docker daemon

this time the script succeeds (so long as there's not a lingering /opt/vsts-agent/work/.taskkey directory on the host with incorrect permissions).

So, assuming I don't have any wires crossed here, I think using the userns-remap feature of Docker with options: --userns host passed to the Azure Pipelines container target is a potential workaround.

Although I think we could get rid of the options: --userns host requirement if azure-pipelines-agent detected whether userns-remap is enabled before deciding to create the new container user.

EugenMayer commented 3 years ago

To drop my few lines here, the problem with sudo and it's requirement under the hood goes beyond "bummer" since it is actually entirely useless due to https://stackoverflow.com/questions/59544762/how-run-a-azure-container-job-under-a-specific-user-in-the-container

For all of you that struggle to re-build all the usual images you consume from docker hub, I have build a template

https://github.com/EugenMayer/docker-image-azure

All it does it picks the official node:8, node:12 .. php:7.4 .. golang:1.5 .. adoptopenjdk/openjdk8 adoptopenjdk/openjdk11 and builds them with azure support without changing it. I do that on daily base using an azure pipeline.

I do this for mostly debian buster based, but you can adjust the template as you wish and pick fedora/centos/alpine as a base

It is a huge bummer and waste of resources, but at least it gives this "use the default images" advantage back, even though a bit masked.

github-actions[bot] commented 3 years ago

This issue has had no activity in 180 days. Please comment if it is not actually stale

edmcman commented 3 years ago

Not stale

Nightreaver commented 2 years ago

I can see this issues is still present and its good to see this is really dragging.... I mean, its actually 2 years old by now. Can we expect any changes in the future?

marc-wilson commented 2 years ago

Still an issue. Would be great if this got some attention.

jhennessey commented 2 years ago

I'm also hoping for a proper fix for this issue...

There was a related discussion in this issue in the dotnet-docker repo where it was decided to not include sudo in the SDK images. The closing comment was:

Closing as we are not going to make this change. Azure DevOps should reconsider their implementation to better support "bring your own image".

It seems many agree with that comment.

anatolybolshakov commented 2 years ago

Hi everyone! We are currently working on more prioritizing issues, but will get back to this one once be able to. This possible enhancement would require additional testing to avoid any possible regressions around it.

xkszltl commented 2 years ago

Hi @anatolybolshakov Could you provides more insights regarding "possible regressions"? What does the solution looks like, an auto-injected sudo?

raven-wing commented 2 years ago

Hi everyone! We are currently working on more prioritizing issues, but will get back to this one once be able to. This possible enhancement would require additional testing to avoid any possible regressions around it.

Whoah. I'm impressed that there are issues which are more important that pipeline not working at all :) (just bumping thread up to show that it's not stale)

dgokeeffe commented 2 years ago

@anatolybolshakov

I think this is a feature that certainly needs to be prioritized ASAP

anatolybolshakov commented 2 years ago

Hey @dgokeeffe I'm no longer working on project - cc @alexander-smolyakov @kirill-ivlev for visibility

kamronbatman commented 2 years ago

This was an issue with Fedora (35/36) and it was ok since I wasn't using sudo. But something changed recently and now errors are fatal so now all of my pipeline builds are broken. Not sure what to do.

https://dev.azure.com/modernuo/modernuo/_build/results?buildId=3501&view=logs&j=5a725d0b-2de3-5d5a-cd56-05b5afe52cda&t=d0e9459a-a89b-48bf-9f9d-66d4b268d541&l=52