jfrog / artifactory-docker-examples

Examples for using Artifactory Docker distribution in various environments
https://www.jfrog.com/artifactory/
Apache License 2.0
330 stars 299 forks source link

Artifactory 6.9.1 / 6.10.0 : "failed to register layer" when pulling the new image with user-namespace enabled in Docker conf #153

Closed PtyMatt closed 5 years ago

PtyMatt commented 5 years ago

When user namespaces are enabled in the Docker daemon (version 18.09.4), I have the following error when I pull the new 6.9.1 image:

$ docker pull docker.bintray.io/jfrog/artifactory-oss:6.9.1
6.9.1: Pulling from jfrog/artifactory-oss
d8bafd553c84: Extracting [==================================================>]    655kB/655kB
dfbe64cc5477: Download complete 
75a4eadd0bf1: Download complete 
3e010093287c: Download complete 
1983e3e1971b: Download complete 
66a9c68e4f27: Download complete 
20b8aad0a114: Download complete 
ff4bb0ff0af1: Download complete 
d7d86dc340f9: Download complete 
d793839658a7: Download complete 
870dafe6f5bc: Download complete 
67dd8ec43850: Download complete 
1ed865ebaa4b: Download complete 
174590277c7f: Download complete 
failed to register layer: ApplyLayer exit status 1 stdout:  stderr: Container ID 321851968 cannot be mapped to a host ID

This post fully describes the (probable) problem:

Background

The user namespace (userns) is a feature of the Linux kernel that adds another security layer to Linux containers. The userns allows a host machine to run containers outside its UID/GID namespace. This means all containers can have a root account (UID 0) in their own namespace and run processes without receiving root privileges from the host machine.

When a userns is created, the Linux kernel provides a mapping between the container and the host machine. For example, if you start a container and run a process with UID 0 inside of it, the Linux kernel maps the container’s UID 0 to a non-privileged UID on the host machine. This allows the container to run a process as if it were the root user, while actually being run by the non-root user on the host machine.

Problem

The error is caused by a userns remapping failure. CircleCI runs Docker containers with userns enabled in order to securely run customers’ containers. The host machine is configured with a valid UID/GID for remapping. This UID/GID must be in the range of 0 - 65535.

When Docker starts a container, Docker pulls an image and extracts layers from that image. If a layer contains files with UID/GID outside of the accepted range, Docker cannot successfully remap and fails to start the container.

Solution

To fix this error, you must update the files’ UID/GID and re-create the image.

If you are not the image maintainer, congratulations: it’s not your responsibility. Contact the imagine maintainer and report the error.

If you are the image maintainer, identify the file with the high UID/GID and correct it.

From this other discussion:

The funny thing is that the final docker image did not contain any file with this ID because I eventually removed the offending files from the image.

So, apparently, if there were any files created with a big user ID in the history of creating the docker image, you are screwed.

No problem with the previous versions.

Thanks for taking into account this bug :)

eldada commented 5 years ago

@PtyMatt - I was not able to reproduce this on my mac or an Ubuntu VM I have. Might it be something local in your env? Can you describe it?

PtyMatt commented 5 years ago

Strange, I've tested on Debian 9 and Ubuntu 18.04, same problem.

The filesystem is ext4.

$ sudo cat /etc/docker/daemon.json
{
    "userns-remap": "default"
}

$ grep dockremap /etc/subuid /etc/subgid
/etc/subuid:dockremap:231072:65536
/etc/subgid:dockremap:231072:65536

For the subuid and subguid, there was two users created before installing and configuring Docker on my Debian (so 100000+2×65536 = 231072). On my Ubuntu only one (so the subuid is 100000+1×65536 = 165536). Some maths to say that the value doesn't matter! (it's just > 100000)

$ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 20
Server Version: 18.09.4
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
 userns
Kernel Version: 4.9.0-8-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.43GiB
Name: PtyMatt
ID: EACS:EPQS:Q7H3:MSUY:WYPO:WTRQ:B2U3:PGBD:LAMO:BQNC:DTZV:XQ6I
Docker Root Dir: /var/lib/docker/231072.231072
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

I looked for the Artifactory oss Dockerfile but I did not find it.

hplatou commented 5 years ago

Same problem here running on Azure web app.

PtyMatt commented 5 years ago

@eldada I still have the problem even with the new 6.10.0 version :/

Is there any new lib added since the v 6.9.1 that may cause the remapping failure?

Also, are you sure you have userns-remap set in your docker conf when you tested?

PtyMatt commented 5 years ago

Oh, this may explain the problem:

The new Artifactory Docker image (6.9.1) has a new base image, which is not based on Debian (that was coming with the openjdk image). The new base image is made with the google distroless framework. You can find our sources in our jfrog-distroless repository.

Originally posted by @eldada in https://github.com/jfrog/artifactory-docker-examples/issues/154#issuecomment-484495081

eldada commented 5 years ago

@PtyMatt - I did not have userns-remap set. Indeed - the new new image is totally different, and might be the cause of the issue. Look at the issue https://github.com/GoogleContainerTools/distroless/issues/313 and see if the proposed solution fixes it.

PtyMatt commented 5 years ago

@eldada the issue you mentionned was a good start :)

If you have access to the /etc/subuid and /etc/subgid files of your hypervisor, you "just" have to increase the range of ids Linux will allow for the remaped Docker user from 65536 to 321851969 (the 321851968 from the error message + 1, you may need to adapt) and it works for me then. But this is an abnormal huge number, and may cause troubles if users were created after the remaped Docker user (ids overlaping).

This is a fix. But after investing the root cause by starting a shell inside the running container, I saw this:

artifactory@84ac787f1b78:/$ ls -l
total 92
drwxr-xr-x    1 root     root          4096 Jan  1  1970 bin
drwxr-xr-x    2 root     root          4096 Jan  1  1970 boot
drwxr-xr-x    2 32185196 root         12288 Jan  1  1970 busybox
drwxr-xr-x    5 root     root           360 May  7 15:29 dev
drwxr-xr-x    3 root     root          4096 May  6 13:20 docker
-rwxrwxr-x    1 root     root         18869 May  6 13:20 entrypoint-artifactory.sh
drwxr-xr-x    1 root     root          4096 May  7 15:29 etc
drwx------    1 artifact artifact      4096 Jan  1  1970 home
drwxr-xr-x    3 32185196 root          4096 Jan  1  1970 java
drwxr-xr-x    1 root     root          4096 Jan  1  1970 lib
drwxr-xr-x    2 root     root          4096 Jan  1  1970 lib64
drwxr-xr-x    1 root     root          4096 May  6 13:20 opt
dr-xr-xr-x  309 65534    65534            0 May  7 15:29 proc
drwx------    2 root     root          4096 Jan  1  1970 root
drwxr-xr-x    2 root     root          4096 Jan  1  1970 run
drwxr-xr-x    1 root     root          4096 Jan  1  1970 sbin
dr-xr-xr-x   13 65534    65534            0 May  7 15:29 sys
drwxrwxrwt    1 root     root          4096 May  7 15:29 tmp
drwxr-xr-x    1 root     root          4096 Jan  1  1970 usr
drwxr-xr-x    1 root     root          4096 May  6 13:20 var

32185196 is not a common id! Two folders and their content (/java, /busybox) have this id as owner, but root as group. Either by adding a RUN chown root -R /java /busybox or the option tar --no-same-owner if relevant somewhere (cf link in my first post), this should solve the problem on all machines/providers.

eldada commented 5 years ago

Thanks @PtyMatt . We'll look into this.

elig commented 5 years ago

Looks like the problem was caused because of the env bazel was running on, build env was fixed + added tests to verify U/G root owner on the experimental added tars https://github.com/jfrog/jfrog-distroless/commit/ca6d4685593f31f0b02b4aeda2e1220a60b92ee1

You can verify by trying to pull docker.bintray.io/jfrog/distroless/base/artifactory-java:adoptopenjdk11-16 and we will update our base image in one of the upcoming Artifactory releases.

csullivannet commented 5 years ago

Since this ticket is closed, is there somewhere else we can track this issue being resolved?

eldada commented 5 years ago

A version with the fix is planned to be released in the near future.

PtyMatt commented 5 years ago

good for me since 0cf829e0f05cdc9a92e62ba413318eb2ecee446d (version 6.10.4)