Open vtbassmatt opened 5 years ago
In our case the default user of the container is linuxbrew
with UID 1000. The Docker image does have sudo
installed, and the linxubrew
user has passwordless access to sudo
. Pipelines attempts to run useradd -m -u 1001 vsts_azpcontainer
, which fails because the linuxbrew
user does not have permission to run useradd
. Running sudo useradd -m -u 1001 vsts_azpcontainer
would succeed. I suggest Pipelines to run sudo useradd -m -u 1001 vsts_azpcontainer
if the current user is non-root and /usr/bin/sudo
exists, and otherwise run useradd -m -u 1001 vsts_azpcontainer
.
Most Docker containers have a default user of root. Our Linuxbrew/brew container is a bit unusual in that regard, that the default user is linuxbrew UID=1000. We do this because Homebrew refuses to run as root.
@TingluoHuang thoughts on how we handle this? I wonder if we could try without sudo (as we currently do), and if that fails, try with sudo?
There are really two issues here, that are mostly unrelated.
For the problem the LinuxBrew folks are hitting, where the agent initialization is assuming that plain docker exec
will have root, I think the solution is just for the agent initialization code to use docker exec -u 0:0
to override any USER
directive in the dockerfile. Docker has root to start off with; there's no point in going root -> some other user -> sudo back to root.
For the problem I'm having, where there's no way for my user code to get root without sudo
, the best solution I can think of is to add a way to mark particular tasks as being executed as root. Then it would be the agent's job to make this happen. For example, it might use sudo
when running in a regular environment, and docker exec -u 0:0
or some other user-switching helper when running in an arbitrary container. Usage might look like:
- bash: "apt update && apt install -y some package"
runAsRoot: true
On the other hand, why is azure-pipelines even trying to execute anything inside the running container? I think other CI providers do not do anything like that. There could also be an option to disable that "feature"?
The feature is explicitly for running build steps inside a container. To do that, we need to make sure the container can read the files mapped into the workspace and, crucially, that the agent can also read anything created by the build step.
There could also be an option to disable that "feature"?
That exists - we don't make anyone use the feature :grin:
Good point. We're running our fist task that creates the artifacts in the Linxubrew/brew Docker image. @vtbassmatt Once the artifacts are created, can we run the task: PublishBuildArtifacts@1
in a different image? And if so, which image do you recommend?
See our usage of task: PublishBuildArtifacts@1
in for example https://github.com/Linuxbrew/homebrew-extra/pull/46/files
We run the whole job in one container. We investigated doing per-step container support, and even had a working proof of concept at one point, but didn't pursue finishing it.
Thanks for the explanation. In that case, we do need to run PublishBuildArtifacts@1
in our Linuxbrew/brew Docker image that's being used to create the artifacts.
I also am unable to use this feature because im trying to use debootstrap which can only run as root.
anyone found a workaround?
@danwalmsley I'm guessing that any container that's full-featured enough to have debootstrap
is also full-featured enough to have sudo
:-). So the workaround is to run sudo debootstrap
. This issue is specifically about containers that are missing sudo
, and you can't install it, because to install sudo
you need to use sudo
...
@njsmith you were right, I was able to use sudo and it worked. thanks
@danwalmsley if it's of any use, I managed to install sudo
by running docker exec -u 0
inside the container:
https://github.com/ApexAI/performance_test/blob/master/azure-pipelines.yml#L9-L17
Containers in Azure are configured so that you can run Docker inside them, so I just exported the Docker executable as a volume and then access the running container as root via docker exec
. The only requirement is to name the container (by passing --name NAME
in options
), so you can access it via docker exec
. The other thing is to not overwrite the sudo config files that the Azure agent generates, but I think it'd be better if the agent wrote them separately to a file /etc/sudoers.d/
instead of /etc/sudoers
Red Hat might consider creating a custom version of Universal Base Image which comes pre configured for CI/CD pipelines with sudo installed. We are tracking it here, but don't have any plans yet.
Today, I would recommend people to build a small layered image (no matter which Linux distro), tuck it in quay.io or dockerhub, and pull from there. Maintaining a small layered image shouldn't be that difficult.
Also, is it possible to have the first step in Azure Pipelines just install sudo (I am assuming not). Sorry, I have never used Azure Pipelines and don't have time to test, but I am the product manager for UBI, so I find this use case interesting from a base image perspective.
To add a hair more background, there is constant tension when building base images (no matter which Linux distro). If we (Red Hat UBI team in this case, but same goes for any base image maintainer/architect) add more packages for every use case on the planet, then base images will eventually be 1GB+. Partners, customers, and cloud providers all need "special" things in their base images, and this use case is so similar.
Today, I would recommend people to build a small layered image (no matter which Linux distro), tuck it in quay.io or dockerhub, and pull from there.
This is probably where we'll end up eventually if this isn't fixed, but having to create a separate repo and maintain and update custom containers is a lot of extra operational complexity for your average open source project, compared to just writing a few lines of yaml... Suddenly we have to figure out credential handling for our container repository, etc.
Also, is it possible to have the first step in Azure Pipelines just install sudo (I am assuming not).
Azure pipelines already starts by injecting some custom software into the image (a copy of nodejs that it uses to run the pipelines agent, which lets it run further commands inside the agent). If they injected a copy of sudo as well, in some well-known location, that would pretty much solve this for any container image.
@njsmith after some trial an errors, I got a sudo installed in some containers this way: https://dev.azure.com/nexB/license-expression/_build/results?buildId=79 https://github.com/nexB/license-expression/blob/3fe3f9359c34b6e6e31e6b3454e450ca8e9e9d6e/azure-pipelines.yml#L80
This is incredibly hackish as it involves running first a command line docker as root that runs docker in docker to install sudo? (or something along these lines). Somehow it works and I am able to get sudo-less containers (such as the official Ubuntu, Debian and CentOS images) to gain sudo access such that I can then install a Python interpreter and eventually run the tests at last.
It looks like this has been first crafted by @esteve for https://github.com/ApexAI/performance_test/blame/6ae8375fa1e3111cb6fa60bdd1d42b9b9f370372/azure-pipelines.yml#L11
There are variations in https://github.com/quamotion/lz4.nativebinaries/blob/4030ff9d97259b05df84c080d494971b62931363/azure-pipelines.yml#L77 and https://github.com/dlemstra/Magick.Native/blob/3c83b2e7d06ded8f052bb5b282c5a79e27b2d6b7/azure-pipelines.yml
Are there any plans to address this issue?
I am trying to run Powershell Core which comes with the docker image 'mcr.microsoft.com/dotnet/core/sdk:3.0.100-preview8-buster' but I am also not able to change permissions on /usr/bin/pwsh to allow execution as the user 'vsts_azpcontainer' has too limited permissions and I can't switch to root.
I see a lot of complex workarounds above by some motivated and creative people, but fundamentally it should be simple to execute container workloads like this. If I pull the image locally and run it then everything works as it should but when I run it on Azure Devops nothing works. Isn't this somehow breaking the docker ideal that "it works on my machine" should disappear?
The suggestion above 'runAsRoot: true' would be great even to have as an option at the container level for people like me who just want to run some simple dotnet core tasks without spending a lot of time working out which chown, chmod, installing sudo or whatever else needs to be done to do this.
@alanwales if sudo is not installed in a container, there are not many ways to work around this. The simplest is to use your own images that have sudo pre-installed (but then at least for me since the main use case is to install extra packages, that would be simpler to directly have a custom images with the packages I need).
That said, there is no reason that this should be so complicated here.
Maybe we have a different interpretation of simple but to quote the original poster of this issue “there's no way maintaining an image like that is worth it for this”. I just want to be able to use the officially provided .Net Core sdk Image as it was designed to be used and how it works in docker natively. Now with yaml pipelines widely available I can imagine this issue will be coming up more often so would be great to prioritize it higher or close with an explanation, thanks
Looking at the Initialize containers
step, what is the reason for asure to add a new sudo user with the same UID/GID as that of the user running the asure agent process? Is the intent to try and retain file permissions for the _work
directory created by the agent before mounting it into the container for the respective job? I'm guessing this is so the agent can always clean the mounted _work
directory before the next jobs?
However, if the user running the asure agent process has access to the docker daemon, and is given sudo access inside the container, why couldn't the asure agent just leave the user to exe into the container unchanged from the Dockerfile and instead use sudo chown/rm to persist/clean the agent workspace between jobs instead? Are the azure cloud agents using a rootless docker install, thus the host user may not have sudo? I'm not sure if that kind of setup is permissible, e.g. the agent process being able to mount to docker.sock, but not have sudo access to clean up leftover workspace volumes.
I really this wish this related ticket had some closure, then perhaps this one could be resolved by having the agent target a newer version of docker, or machines with newer linux kernels:
Add ability to mount volume as user other than root #2259 https://github.com/moby/moby/issues/2259
I think I found the file that relates to most of the issues here. Looking through the comments also explances a few blocks I explaines earlier today.
Ah, yep, the azure agent is not happy about running container jobs while inside a container itself, although I feel like using docker volumes as opposed to host mounts would help avoid having to resolve absolute paths to mount volumes from the host filesystem. Then the azure agent could run as root in it's own container, so it could clean the workspace left from any job without changing the container's Dockerfile default user. I guess the agent might also want to peek into the docker image to determine the default user, so it could prepare the permissions for workspace files so that the default user could use them.
Here is where the Temp, Tools, Tasks, and Externals folders are being added as host volumes in the container:
I'm not sure docker exec runs as root by default; it's just that the default user for most official images are already root, so many derivative images never change this. I think exec just keeps with user declared by last USER
directive in the docker image layer, same as run commands.
@TingluoHuang, I guess it's been a while since you added https://github.com/microsoft/azure-pipelines-agent/pull/1005 , but do you think it would be possible to achieve the same workspace setup going on now by first copying the _work
workspace from the host to a standalone docker container volume, and then attaching that container volume to the job container, so we can avoid changing the expected default user from the Dockerfile?
https://docs.docker.com/engine/reference/commandline/volume_create/
Given the Container Operation Provider tries to reconcile the permissions between the user in the container and the user on the host, a hacky workaround could be to set AGENT_ALLOW_RUNASROOT
and launch the agent using the root user on the host. See https://github.com/microsoft/azure-pipelines-agent/pull/1878 , for an example.
Although, if your using a cloud hosted agent, rather than a local agent you could escalate, this doesn't really help when user hosting the agent is vsts
(1001:117) on azure cloud.
@jtpetty this is a good one to noodle on as we think about how to evolve container support.
@vtbassmatt , it looks like the beta for github actions is also being based on the azure agent runner. Would there be an existing issue or repo to suggest changes to the workspace strategy. I can understand how having a reserved directory path simplifies the file system mounting on the agent backend, but its a bit opinionative/constraining on where users can make stateful changes in the container. I.e. it adds a lot of boilerplate shuttling things back and forth from the workspace to elsewhere in the filestem.
I suppose one could nest logs/builds/caches in the workspace, then symlink to where they are expected in the container filesystem, but that doesn't seem as transparent. It'd be cool to see a similar pattern like with CircleCI, where the uploading assets, submitting test_results, and caching directories can be performed anywhere in the container filesystem, not just reserved to the azure/github workspace folder. https://github.com/microsoft/azure-pipelines-tasks/issues/10870#issuecomment-524692639
@ruffsl yes, the GitHub runner is based on the Azure Pipelines agent code. It's not a slam-dunk for the agent to back-port runner changes, though. The runner doesn't have to be backwards compatible with existing Azure Pipelines customers. I still hope we can evolve our container support to be a little more industry standard.
What's the workaround for this ?
If it's an option for you, you could consider switching to GitHub Actions.
Could you please provide a workaround for this?
Try to create a user with UID '1001' inside the container.
/usr/bin/docker exec b76b29190ab25216e4e99fd12ec57375501125c3d19a59c05bffb2d7036e483e bash -c "getent passwd 1001 | cut -d: -f1 "
/usr/bin/docker exec b76b29190ab25216e4e99fd12ec57375501125c3d19a59c05bffb2d7036e483e useradd -m -u 1001 vsts_azpcontainer
useradd: Permission denied.
useradd: cannot lock /etc/passwd; try again later.
##[error]Docker exec fail with exit code 1
This line should never run by default IMHO, and it should be at the very least configurable. Otherwise you make a whole lot of assumptions on the images of your users, and I don't think that's a good thing.
@mfkl if we didn't run that line, how would the host access files generated by tasks running in the container? The host is not guaranteed to have root privileges, and some agent services run on the host side.
how would the host access files generated by tasks running in the container?
Does it really need to for all possible uses cases?
I don't know, I'm not familiar with the azure-pipelines-agent code. But I believe other CI systems don't have this requirement enforced on users images.
Not for all possible use cases, but important/widespread ones: test results upload, pipeline artifacts, and caching.
Other CI systems definitely work differently than Azure Pipelines in this respect. AFAIK no one else tries to abstract the work (task) from the execution environment (container or VM).
Since standard, popular docker images simply won't work because of this, it might be a good idea to revisit this design choice IMHO.
Agreed. We're fully booked on other work for a while but this is something I want to revisit.
@mfkl if we didn't run that line, how would the host access files generated by tasks running in the container? The host is not guaranteed to have root privileges, and some agent services run on the host side.
Is there a reason why you can't use docker cp
?
Alternatively, if docker is running as root but the agent is not, could you not chmod
or chown
the files using docker exec
?
Is there any updates on this? What's the currently accepted/recommended workaround for a simple use-case where I need to run an application's tests in a default Ubuntu container?
At the moment I am doing the same workaround as @esteve above with the following container config:
container:
image: <your image here, ubuntu:latest for example>
options: "--name ci-container -v /usr/bin/docker:/tmp/docker:ro"
And then add this as the first step before you do anything else:
- script: |
/tmp/docker exec -t -u 0 ci-container \
sh -c "apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -o Dpkg::Options::="--force-confold" -y install sudo"
displayName: 'Install Sudo in container (thanks Microsoft!)'
This works and subsequent steps can use sudo but it still feels like a terrible hack and something that shouldn't have to be done especially if it breaks the conventions that the majority of Docker Hub images are built around (the assumption that you are already root thus no need for sudo). No other CI service that I'm familiar with requires such workarounds.
@Rjevski
Welcome to Fedora, where sudo
is already there \(^o^)/~
I'm using the user namespace feature of Docker in order to run the azure-pipelines-agent as a non-root user. As I understand it, enabling this at the daemon level should help to avoid many of the file permissions issues mentioned in the following comment:
With userns-remap enabled on the host docker daemon, I don't believe I need azure-pipelines-agent to create a new user to run as within the container. Running as root within the container should be fine, and any files it leaves behind will be automatically remapped to be owned by non-root host user.
In its current state, if I configure the container resource as:
resources:
containers:
- container: pycontainer
image: python:3.8
then I get an error running a script task within the container that states:
EACCES: permission denied, open '/__w/_temp/.taskkey'
If I pass --user-ns host
as an option, like so:
resources:
containers:
- container: pycontainer
image: python:3.8
options: --userns host # this is necessary because we enable userns-remap on the host docker daemon
this time the script succeeds (so long as there's not a lingering /opt/vsts-agent/work/.taskkey
directory on the host with incorrect permissions).
So, assuming I don't have any wires crossed here, I think using the userns-remap feature of Docker with options: --userns host
passed to the Azure Pipelines container target is a potential workaround.
Although I think we could get rid of the options: --userns host
requirement if azure-pipelines-agent
detected whether userns-remap is enabled before deciding to create the new container user.
To drop my few lines here, the problem with sudo
and it's requirement under the hood goes beyond "bummer" since it is actually entirely useless due to https://stackoverflow.com/questions/59544762/how-run-a-azure-container-job-under-a-specific-user-in-the-container
For all of you that struggle to re-build all the usual images you consume from docker hub, I have build a template
https://github.com/EugenMayer/docker-image-azure
All it does it picks the official node:8
, node:12
.. php:7.4
.. golang:1.5
.. adoptopenjdk/openjdk8
adoptopenjdk/openjdk11
and builds them with azure support without changing it. I do that on daily base using an azure pipeline.
I do this for mostly debian buster based, but you can adjust the template as you wish and pick fedora/centos/alpine as a base
It is a huge bummer and waste of resources, but at least it gives this "use the default images" advantage back, even though a bit masked.
This issue has had no activity in 180 days. Please comment if it is not actually stale
Not stale
I can see this issues is still present and its good to see this is really dragging.... I mean, its actually 2 years old by now. Can we expect any changes in the future?
Still an issue. Would be great if this got some attention.
I'm also hoping for a proper fix for this issue...
There was a related discussion in this issue in the dotnet-docker repo where it was decided to not include sudo
in the SDK images. The closing comment was:
Closing as we are not going to make this change. Azure DevOps should reconsider their implementation to better support "bring your own image".
It seems many agree with that comment.
Hi everyone! We are currently working on more prioritizing issues, but will get back to this one once be able to. This possible enhancement would require additional testing to avoid any possible regressions around it.
Hi @anatolybolshakov Could you provides more insights regarding "possible regressions"? What does the solution looks like, an auto-injected sudo?
Hi everyone! We are currently working on more prioritizing issues, but will get back to this one once be able to. This possible enhancement would require additional testing to avoid any possible regressions around it.
Whoah. I'm impressed that there are issues which are more important that pipeline not working at all :) (just bumping thread up to show that it's not stale)
@anatolybolshakov
I think this is a feature that certainly needs to be prioritized ASAP
Hey @dgokeeffe I'm no longer working on project - cc @alexander-smolyakov @kirill-ivlev for visibility
This was an issue with Fedora (35/36) and it was ok since I wasn't using sudo. But something changed recently and now errors are fatal so now all of my pipeline builds are broken. Not sure what to do.
Agent Version and Platform
Version of your agent? 2.x series
OS of the machine running the agent? Linux
Azure DevOps Type and Version
any Azure DevOps account
What's not working?
(copied from docs repo: https://github.com/MicrosoftDocs/vsts-docs/issues/2939) - reported by @njsmith: The example here demonstrates using the
container:
feature with theubuntu:16.04
image. Which is great! This is exactly what I want to do, though withubuntu:18.10
to test my software on the latest versions of everything (in particular openssl 1.1.1).And the
container:
feature is pretty slick: it goes to a lot of trouble to map things into the container in a clever way, and set up a non-root user to run as, while granting that usersudo
permissions, etc.But... the standard images maintained by dockerhub, like
ubuntu:16.04
andubuntu:18.10
ordebian:testing
, don't have sudo installed. Which means that if you use them withcontainer:
, you actually cannot get root inside the container. It's impossible.I guess the
container:
feature is useful for folks who are already maintaining some kind of development-environment images for their own use, but this makes it a complete non-starter for my use case, where I just want to use pipelines normally, but test on a different distro. I guess in theory I could maintain my own image that is just the officialubuntu:18.10
+sudo
installed, but there's no way maintaining an image like that is worth it for this.Instead I've had to give on up using
container:
and am instead writing things like:This is workable, but it's really a shame to lose all the slick
container:
features just because of this.It would be really nice if the
container:
feature could some make sure it was possible to get root inside the container. For example, there could be a config key to request running asroot
, either for the whole job or just for a specificscript
orbash
task. Or the container setup phase could mount in a volume containing a suidsudo
orgosu
. Or anything, really...The LinuxBrew folks are also facing a similar challenge. See https://github.com/Linuxbrew/brew/issues/746#issuecomment-452873130