updateUID.Dockerfile build step should occur before any features are installed (specifically docker-from-docker)

microsoft / vscode-remote-release

Visual Studio Code Remote Development: Open any folder in WSL, in a Docker container, or on a remote machine using SSH and take advantage of VS Code's full feature set.

https://aka.ms/vscode-remote

Other

3.66k stars 287 forks source link

updateUID.Dockerfile build step should occur before any features are installed (specifically docker-from-docker) #6461

Closed mdgilene closed 10 months ago

mdgilene commented 2 years ago

VSCode Version: 1.63.2
Local OS Version: N/A
Remote OS Version: N/A
Remote Extension/Connection Type: Docker
Logs: N/A

Steps to Reproduce:

I am using the docker-from-docker container feature perfectly fine from my local windows based Docker environment. I am then attempting to utilize the dev-container-cli to build and run my builds within my CI/CD environment (in this case Jenkins).

Build devcontainer with dev-container-cli
```
export DOCKER_BUILDKIT=1
export BUILDKIT_PROGRESS=plain
devcontainer build --image-name $DEVCONTAINER_IMAGE
```
This correctly builds the image and runs the updateUID.Dockerfile build step to adjust the vscode user's UID/GID to match that of the jenkins user launching the container.

Launch Jenkins agent making sure to mount the docker socket

agent {
  docker {
    reuseNode true
    image env.DEVCONTAINER_IMAGE
    args "-v /var/run/docker.sock:/var/run/docker-host.sock:ro"
  }
}

Attempt any docker command from inside the container to receive a "Permission denied" error.
Inspect current UID/GID and /run directory to see that /var/run/docker.sock is still owned by user 1000 instead of the updated UID (in this case 1005)
```
id
uid=1005(vscode) gid=1006(vscode) groups=1006(vscode)
```

ls -aln /run total 4 drwxr-xr-x. 3 0 0 73 Mar 16 18:05 . drwxr-xr-x. 22 0 0 4096 Mar 16 18:13 .. srw-rw----. 1 0 985 0 Mar 16 17:58 docker-host.sock lrwxrwxrwx. 1 1000 0 25 Mar 16 18:05 docker.sock -> /var/run/docker-host.sock drwxrwxrwt. 2 0 0 6 Jan 25 00:00 lock -rw-rw-r--. 1 0 43 0 Jan 25 00:00 utmp



From what I can tell the docker installation step occurs before the UID/GID update thus installing and configuring docker only for UID 1000

The only solution I have found so far is to update the owner of `/var/run/docker.sock` on the host machine to the Jenkins user rather than root. Which....is less than ideal.

chrmarti commented 2 years ago

We do the UID update last because that depends on the local machine whereas everything else could be prebuilt in CI.

@Chuxel @joshspicer Can we change the sockets ownership to root:docker and make the regular user part of the docker group? That should keep working after the user's UID and default GID changed.

Chuxel commented 2 years ago

Yeah the code in this section attempts to deal with this situation: https://github.com/microsoft/vscode-dev-containers/blob/5623b4a28cdd5cf9bae1b3318d52862657d46b89/script-library/docker-debian.sh#L279

There is an entrypoint script and since the Docker CLI is present but not the engine, so a docker-host group is created with same GID and the user is added to it unless that GID is "0"... in which case socat is used as a final fallback.

I am wondering if the entrypoint script hasn't been fired in the situation you're talking about here. Can you post your entire devcontainer.json and Dockerfile?

Also, note that right now if you are using features, you need to reference the feature even if it has been pre-built. This brings across configuration needed at runtime. That said, we're talking about improving this (e.g. https://github.com/microsoft/dev-container-spec/issues/18)

mdgilene commented 2 years ago

devcontainer.json

// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.209.6/containers/python-3
{
    "name": "Python 3",
    "build": {
        "dockerfile": "Dockerfile",
        "context": "..",
        "args": {
            // Update 'VARIANT' to pick a Python version: 3, 3.10, 3.9, 3.8, 3.7, 3.6
            // Append -bullseye or -buster to pin to an OS version.
            // Use -bullseye variants on local on arm64/Apple Silicon.
            "VARIANT": "3.8",
            // Options
            "NODE_VERSION": "none"
        }
    },
    // Set *default* container specific settings.json values on container create.
    "settings": {
        "python.languageServer": "Pylance",
        "python.linting.enabled": true,
        "python.linting.pylintEnabled": true,
        "python.formatting.provider": "black",
        "python.linting.flake8Enabled": true,
        "python.linting.banditEnabled": true,
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.fixAll": true,
            "source.organizeImports": true
        },
    },
    // Add the IDs of extensions you want installed when the container is created.
    "extensions": [
        "ms-python.python",
        "ms-python.vscode-pylance",
        "psioniq.psi-header",
        "ms-azuretools.vscode-docker",
        "njpwerner.autodocstring",
        "mongodb.mongodb-vscode",
        "eamodio.gitlens"
    ],
    "containerEnv": {
        "PYTHONPATH": "${containerWorkspaceFolder}",
    },
    "postCreateCommand": "/scripts/setup_env.sh",
    "remoteUser": "vscode",
    "features": {
        "docker-from-docker": "latest",
        "git": "os-provided",
        "git-lfs": "latest"
    }
}

Dockerfile

# See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.209.6/containers/python-3/.devcontainer/base.Dockerfile

# [Choice] Python version (use -bullseye variants on local arm64/Apple Silicon): 3, 3.10, 3.9, 3.8, 3.7, 3.6, 3-bullseye, 3.10-bullseye, 3.9-bullseye, 3.8-bullseye, 3.7-bullseye, 3.6-bullseye, 3-buster, 3.10-buster, 3.9-buster, 3.8-buster, 3.7-buster, 3.6-buster
ARG VARIANT="3.10-bullseye"
FROM mcr.microsoft.com/vscode/devcontainers/python:0-${VARIANT}

# [Choice] Node.js version: none, lts/*, 16, 14, 12, 10
ARG NODE_VERSION="none"
RUN if [ "${NODE_VERSION}" != "none" ]; then su vscode -c "umask 0002 && . /usr/local/share/nvm/nvm.sh && nvm install ${NODE_VERSION} 2>&1"; fi

# [Optional] If your pip requirements rarely change, uncomment this section to add them to the image.
# COPY requirements.txt /tmp/pip-tmp/
# RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt \
#    && rm -rf /tmp/pip-tmp

# [Optional] Uncomment this section to install additional OS packages.
# RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
#     && apt-get -y install --no-install-recommends <your-package-list-here>

# [Optional] Uncomment this line to install global node packages.
# RUN su vscode -c "source /usr/local/share/nvm/nvm.sh && npm install -g <your-package-here>" 2>&1

RUN apt-get install wget apt-transport-https gnupg lsb-release && \
    wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add - && \
    echo deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main | sudo tee -a /etc/apt/sources.list.d/trivy.list && \
    apt-get update && \
    apt-get install trivy

COPY .devcontainer/scripts /scripts

I have also attempted manually setting the entrypoint or even executing the docker-init.sh script from within my pipeline but I didn't seem to notice any difference.

As for you last point I'm not sure what you mean by "you need to reference the feature even if it has been pre-built". Do I need to be manually specifying additional LABELs on my Dockerfile?

Chuxel commented 2 years ago

Trying to understand your scenario a bit better - are you using Jenkins to launch the dev container instead of Remote - Containers (or Codespaces if you're using that)?

I'm not familiar enough with Jenkins to know exactly what it does when spinning up a container from an image, but I can make some educated guesses.

The image alone is not enough to make everything work as expected currently. We are working on the ability to use a CLI to launch with all the needed config (https://github.com/microsoft/dev-container-spec/issues/9). However, today you'd need to manually set the entrypoint if you are using this directly. e.g., from the command line:

docker run -v /var/run/docker.sock:/var/run/docker-host.sock:ro --entrypoint /usr/local/share/docker-init.sh

So, I suspect --entrypoint would need to be specified in your args for the Jenkins config. Thiis could also be the command if you add something after it. e.g. /usr/local/share/docker-init.sh sleep infinity However, I'm assuming Jenkins uses the command to set itself up.

mdgilene commented 2 years ago

Yes, I am using Jenkins to run the pipeline stages inside of the container. Using the agent block I can specify a docker based build agent using a given image and all subsequent stages will run inside a container that Jenkins creates from that image.

I have extracted this from the Jenkins build logs. This is the command that gets generated to run the container

I've had to redact some things but I think this should get the idea across.

docker run -t -d -u 1005:1006 -v /var/run/docker.sock:/var/run/docker-host.sock:ro --add-host=host.docker.internal:host-gateway -w /home/jenkins/workspace/my-project_master -v /home/jenkins/workspace/my-project_master:/home/jenkins/workspace/my-project_master:rw,z -v /home/jenkins/workspace/my-project_master_tmp:/home/jenkins/workspace/my-project_master_tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** image-built-by-devcontainer cat

Where all the -e ******** is Jenkins just injecting credential as environment variables.

I'll give adding in the entrypoint another go but I seem to remember that not working.

mdgilene commented 2 years ago

So just added back in the argument to manually specify the docker-init.sh entrypoint. Still getting the same output as my first post.

Updated Command

docker run -t -d -u 1005:1006 -v /var/run/docker.sock:/var/run/docker-host.sock:ro --add-host=host.docker.internal:host-gateway --entrypoint /usr/local/share/docker-init.sh -w /home/jenkins/workspace/my-project_master -v /home/jenkins/workspace/my-project_master:/home/jenkins/workspace/my-project_master:rw,z -v /home/jenkins/workspace/my-project_master_tmp:/home/jenkins/workspace/my-project_master_tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** image-built-by-devcontainer cat

+ id
uid=1005(vscode) gid=1006(vscode) groups=1006(vscode)

+ groups
vscode

+ ls -aln /run
total 4
drwxr-xr-x.  3    0   0   73 Mar 22 16:35 .
drwxr-xr-x. 22    0   0 4096 Mar 22 16:35 ..
srw-rw----.  1    0 985    0 Mar 22 16:28 docker-host.sock
lrwxrwxrwx.  1 1000   0   25 Mar 22 16:35 docker.sock -> /var/run/docker-host.sock
drwxrwxrwt.  2    0   0    6 Jan 25 00:00 lock
-rw-rw-r--.  1    0  43    0 Jan 25 00:00 utmp

If the docker-init.sh is supposed to be creating a docker group with the matching GID and adding the user to that group it doesn't look like it is actually working because the group list still just shows vscode.

chrmarti commented 2 years ago

Can you start as root and chown 1005:1006 /var/run/docker-host.sock before changing to your regular user using su or runuser?

Chuxel commented 2 years ago

Okay, there's actually a few things potentially going on here.

In Remote - Containers, the container itself runs as root, but you connect as vscode. As a result, the vscode user can be modified during in the entrypoint.
However, if you start the container as vscode (-u vscode), you're already logged in as vscode, so the update works, but you're already logged in, so it doesn't take effect. We probably need to look at creating the group, adding the user, and modifying the GID instead.

However, @chrmarti Does the dev container CLI's build step do GID/UID syncing? I would expect that to be done at the time it's spun up... that's one of the reasons we want the exec step for the dev container CLI.

I would expect...

export DOCKER_BUILDKIT=1
export BUILDKIT_PROGRESS=plain
devcontainer build --image-name $DEVCONTAINER_IMAGE

... to result in an image with vscode's UID being 1000. When I just tried, that seemed to be what happened. When I then run the command above that specifies -u 1005:1006 interactively, I see this:

 $ docker run -it --rm -u 1005:1006 test bash

whoami: cannot find name for user ID 1005
I have no name!@baad1d2ab9a4:/$ id
uid=1005 gid=1006 groups=1006

The 1005 user isn't in the sudoers file, so nothing would update anyway..

chrmarti commented 2 years ago

Correct, devcontainer build doesn't update the UID/GID. That is done as a separate build step before running the container (docker run won't do that, only devcontainer up/run would).

Chuxel commented 2 years ago

Ok - I think I have a fix for the other issue queued up.

mdgilene commented 2 years ago

In Remote - Containers, the container itself runs as root, but you connect as vscode. As a result, the vscode user can be modified during in the entrypoint.

Okay I think this is a key detail I was missing, but makes sense now in retrospect. With this in mind I made the following changes and things seem to be working.

Run Command docker run -u root:root -v /var/run/docker.sock:/var/run/docker-host.sock --entrypoint /usr/local/share/docker-init.sh ...
Build stages inside container One of two options here neither of which are pretty
1. Add runuser -u vscode before every command in the pipeline
```
steps {
    sh '''
    runuser -u vscode -- id

    runuser -u vscode -- groups

    runuser -u vscode -- ls -aln /run

    runuser -u vscode -- docker ps
    '''
}
```
  1. Put all commands together into a single script and switch user once, as a quick and dirty test I did this all inline, but theortically I could extract out every sage into a script file stored in the repo but again this isn't ideal as it hides the build logic and can no longer be viewed as easily from the build output
```
steps {
  sh '''
  su vscode -c "id && groups && ls -aln /run && docker ps"
  '''
}
```

Correct, devcontainer build doesn't update the UID/GID. That is done as a separate build step before running the container (docker run won't do that, only devcontainer up/run would).

Are you sure about that? I see the following in my build about when the devcontainer CLI is building the image

[2036 ms] Start: Run: docker build -f /tmp/vsch/updateUID.Dockerfile-0.216.0 -t my-proj-4c47088dda57b74ac116db45c885f6cf-features-uid --build-arg BASE_IMAGE=my-proj-4c47088dda57b74ac116db45c885f6cf-features --build-arg REMOTE_USER=vscode --build-arg NEW_UID=1005 --build-arg NEW_GID=1006 --build-arg IMAGE_USER=root /tmp/vsch

#1 [internal] load build definition from updateUID.Dockerfile-0.216.0

#1 sha256:123f672a1ef8a208db94fc48ff56735708e8a64b574c753451d56d16b1126b03

#1 DONE 0.0s

#1 [internal] load build definition from updateUID.Dockerfile-0.216.0

#1 sha256:123f672a1ef8a208db94fc48ff56735708e8a64b574c753451d56d16b1126b03

#1 transferring dockerfile: 1.39kB done

#1 DONE 0.1s

#2 [internal] load .dockerignore

#2 sha256:9c09325d10718ca7b3f3348f660eb645d6208dfcb287cea9ab623610c77c8bff

#2 transferring context: 2B done

#2 DONE 0.1s

#3 [internal] load metadata for docker.io/library/my-proj-4c47088dda57b74ac116db45c885f6cf-features:latest

#3 sha256:39e77b8c190df8d5b7b09007cfe461108db0375e24912ad29cc3f0a11801bce6

#3 DONE 0.0s

#4 [1/2] FROM docker.io/library/my-proj-4c47088dda57b74ac116db45c885f6cf-features

#4 sha256:62cfcb001e1a81a934783fec587968da7bf75d9cdfb33cbed1251106f7e75e97

#4 DONE 0.0s

#5 [2/2] RUN eval $(sed -n "s/vscode:[^:]*:\([^:]*\):\([^:]*\):[^:]*:\([^:]*\).*/OLD_UID=\1;OLD_GID=\2;HOME_FOLDER=\3/p" /etc/passwd);  eval $(sed -n "s/\([^:]*\):[^:]*:1005:.*/EXISTING_USER=\1/p" /etc/passwd);  eval $(sed -n "s/\([^:]*\):[^:]*:1006:.*/EXISTING_GROUP=\1/p" /etc/group);  if [ -z "$OLD_UID" ]; then      echo "Remote user not found in /etc/passwd (vscode).";  elif [ "$OLD_UID" = "1005" -a "$OLD_GID" = "1006" ]; then       echo "UIDs and GIDs are the same (1005:1006).";     elif [ "$OLD_UID" != "1005" -a -n "$EXISTING_USER" ]; then      echo "User with UID exists ($EXISTING_USER=1005).";     elif [ "$OLD_GID" != "1006" -a -n "$EXISTING_GROUP" ]; then         echo "Group with GID exists ($EXISTING_GROUP=1006).";   else        echo "Updating UID:GID from $OLD_UID:$OLD_GID to 1005:1006.";       sed -i -e "s/\(vscode:[^:]*:\)[^:]*:[^:]*/\11005:1006/" /etc/passwd;        if [ "$OLD_GID" != "1006" ]; then           sed -i -e "s/\([^:]*:[^:]*:\)${OLD_GID}:/\11006:/" /etc/group;      fi;         chown -R 1005:1006 $HOME_FOLDER;    fi;

#5 sha256:09fb5daa204c34e2b5e48ae49ec19fbff7dbc24acaa01e46c59204824eaba6e8

#5 CACHED

#6 exporting to image

#6 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00

#6 exporting layers done

#6 writing image sha256:29802ed37f3b5f5cd96a581fd8566d801208d18225f07ad6890c6534d5c6b3fe done

#6 naming to docker.io/library/my-proj-4c47088dda57b74ac116db45c885f6cf-features-uid done

#6 DONE 0.0s

[2679 ms] Start: Run: docker tag my-proj-4c47088dda57b74ac116db45c885f6cf-features-uid DEVCONTAINER_IMAGE

Unless I am misunderstanding what you mean here?

Chuxel commented 2 years ago

I fixed the upstream script to handle this situation.

On the uid - that's coming from the CLI? Odd - @chrmarti was there an earilier version of the CLI that did this?

chrmarti commented 2 years ago

Possibly, that would have been a bug. The latest doesn't run the UID update during build.

mdgilene commented 2 years ago

Getting back around to looking at this.

I've updated the devcontainer cli and no longer see the GID update during build so must have worked that way in a previous version.

However, if you start the container as vscode (-u vscode), you're already logged in as vscode, so the update works, but you're already logged in, so it doesn't take effect. We probably need to look at creating the group, adding the user, and modifying the GID instead.

@Chuxel I think I'm understanding now. But even after this fix I'm not sure all the issues are resolved. If I understand correctly you would then make sure to start the container with -u vscode but the volume mount for the workspace is still owned by whatever user is running on the host so the vscode user won't be able to read the workspace.

Also it seems that even after this fix you still have to re-connect to the container as vscode or sudo su vscode every command as the GID isn't updating. See below for output.

Docker Run

docker run --rm -it -u vscode -v /var/run/docker.sock:/var/run/docker-host.sock -v /home/jenkins/project:/workspaces/project -w /workspaces/project --entrypoint=/usr/local/share/docker-init.sh test bash

Inside Container

vscode ➜ /workspaces/project $ id
uid=1000(vscode) gid=1000(vscode) groups=1000(vscode),102,998(nvm),999(pipx)

vscode ➜ /workspaces/project $ sudo su vscode -c "id"
uid=1000(vscode) gid=1000(vscode) groups=1000(vscode),985(docker),998(nvm),999(pipx)

vscode ➜ /workspaces/project $ cat /etc/group | grep -i docker
docker:x:985:vscode

vscode ➜ /workspaces/project $ touch test.txt
touch: cannot touch 'test.txt': Permission denied

Alternatively starting the container disconnected and reconnecting with vscode does resolve the docker issue. But still doesn't help the fact that it can't read the workspace files.

Docker Run

docker run --rm -d -u vscode -v /var/run/docker.sock:/var/run/docker-host.sock -v /home/jenkins/project:/workspaces/project -w /workspaces/project --entrypoint=/usr/local/share/docker-init.sh test sleep infinity

Docker Exec

docker exec -it -u vscode devcontainer bash

Inside Container

vscode ➜ /workspaces/project $ id
uid=1000(vscode) gid=1000(vscode) groups=1000(vscode),985(docker),998(nvm),999(pipx)

vscode ➜ /workspaces/project $ docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED         STATUS         PORTS     NAMES
d924a2e01bbc   test                             "/usr/local/share/do…"   2 minutes ago   Up 2 minutes             devcontainer

vscode ➜ /workspaces/project $ ls
ls: cannot open directory '.': Permission denied

Chuxel commented 2 years ago

Correct. The fix would solve the problem of the docker group getting updated, but the files in the bind mount into the container is still the host's user while the container itself has the non-root user as 1000:1000. The UID/GID shift happens at the time the container is started, not when the image is built since each system can be different. The upcoming dev container CLI should do this automatically by allowing you to execute commands inside of it. But in this situation you'd need to apply the usermod/groupmod manually via a Dockerfile.

That said, you can opt to run inside the container as root which avoids that problem entirely, but I realize that has security implications to the host. That said, if you're already running docker as root and there's sudo in this image, it wouldn't be a new risk.

Chuxel commented 2 years ago

Just to write down what that dockerfile would look like:

FROM <your image name here>
ARG UID=1000
ARG GID=1000
ARG USERNAME=vscode
USER root
RUN usermod -u ${UID} ${USERNAME} &&  usermod -g ${GID} ${USERNAME}
USER vscode

Building the image would then be:

docker build --build-arg UID=$(id -u) --build-arg GID=$(id -g) .

mdgilene commented 2 years ago

And this would be an additional image build after the devcontainer build ... is finished?

I.e.

devcontainer build --image-name devcontainer-base

docker build -t devcontainer-final -f .devcontainer/updateUID.dockerfile  --build-arg UID=$(id -u) --build-arg GID=$(id -g) --build-arg BASE_IMAGE=devcontainer-base  .

Where updateUID.dockerfile is the one you showed above?

Definitely looking forward to the update to the devcontainer CLI however. I see there currently exists a devcontainer run command but it's marked as experimental and I couldn't quite get it to actually run anything.

Chuxel commented 2 years ago

@mdgilene Yep - that's correct. @chrmarti is working on the CLI as a part of https://github.com/microsoft/dev-container-spec/issues/9 this iteration.