moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.53k stars 18.63k forks source link

[go1.6][1.11 regression] user namespaces + btrfs can't run containers #21087

Closed cyphar closed 8 years ago

cyphar commented 8 years ago

This works on both 1.10.2 and 1.10.3-rc2.

% docker version
Client:
 Version:      1.11.0-dev
 API version:  1.23
 Go version:   go1.6
 Git commit:   f70f570
 Built:        Thu Mar 10 12:13:38 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0-dev
 API version:  1.23
 Go version:   go1.6
 Git commit:   f70f570
 Built:        Thu Mar 10 12:13:38 2016
 OS/Arch:      linux/amd64
% docker info
Containers: 4
 Running: 0
 Paused: 0
 Stopped: 4
Images: 7
Server Version: 1.11.0-dev
Storage Driver: btrfs
 Build Version: Btrfs v3.17
 Library Version: 101
Execution Driver: native-0.2
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 4.4.3-1-default
Operating System: openSUSE Tumbleweed (20160307) (x86_64)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.688 GiB
Name: gondor
ID: DDDM:OY6U:WD3X:DVTL:6OXN:JL4L:2YDN:2BAU:6LAZ:WCZA:Z2YJ:4KH5
Docker Root Dir: /var/lib/docker/123456.123456
Debug mode (client): false
Debug mode (server): false
Username: cyphar
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support

List the steps to reproduce the issue:

% docker daemon -s btrfs --userns-remap default
[in another shell]
% docker pull alpine:latest
% docker run -it alpine sh
docker: Error response from daemon: Container command could not be invoked..

The error logs from docker daemon being:

ERRO[0109] error locating sandbox id 4ccbd5aebe711239cdbbd7ed9fe4305420fc99238a75ebea32c562c802f6ed34: sandbox 4ccbd5aebe711239cdbbd7ed9fe4305420fc99238a75ebea32c562c802f6ed34 not found 
WARN[0109] failed to cleanup ipc mounts:
failed to umount /var/lib/docker/123456.123456/containers/d2b965c3dba48236adf830a90b956664f17196c3f9464507b172bb9bf600da0a/shm: invalid argument 
ERRO[0109] Error unmounting container d2b965c3dba48236adf830a90b956664f17196c3f9464507b172bb9bf600da0a: not mounted 
ERRO[0109] Handler for POST /v1.23/containers/d2b965c3dba48236adf830a90b956664f17196c3f9464507b172bb9bf600da0a/start returned error: Container command could not be invoked. 

/cc @estesp

HackToday commented 8 years ago

I tried use loop device to simulate btrfs, (make btrfs pools etc). Not found this issue

ubuntu@dockerwork:~$ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.10.3
Storage Driver: btrfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 3.13.0-43-generic
Operating System: Ubuntu 14.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.798 GiB
Name: dockerwork
ID: TKJQ:7UVP:PMPC:QIYJ:BYIO:NFMN:D4MI:CVUN:4PNS:IHAT:5FLP:6I73
WARNING: No swap limit support
ubuntu@dockerwork:~$ docker pull alpine:latest
latest: Pulling from library/alpine
4d06f2521e4f: Pull complete 
Digest: sha256:7739b19a213f3a0aa8dacbd5898c8bd467e6eaf71074296a3d75824e76257396
Status: Downloaded newer image for alpine:latest
ubuntu@dockerwork:~$ docker run -it alpine sh
/ # 

Maybe it happened on real device only?

cyphar commented 8 years ago

Yes, this is with btrfs on my root filesystem, so it's using actual subvolumes.

HackToday commented 8 years ago

Not understand why only happened on actual subvolumes. Interesting to know that. I did not have free machine or disk to test that. So wait for others response :)

thaJeztah commented 8 years ago

ping @vbatts, who runs on btrfs :heart:

estesp commented 8 years ago

Something is very strange. I can no longer get userns to run on my master fork (even after making sure I'm even with master) on any kind of graphdriver. After lots of headscratching, I decided that the "chdir()" failure with access permissions must be related to /var/lib/docker dir perms.. and sure enough, if I change it on Fedora or Ubuntu to "0711" (you must do this after starting Docker as it checks/sets perms every time), then everything starts working again.

Why master runs are not hitting this, I have no idea :( Edit: And why this change is needed locally since last week (my last master build I can test against) I haven't had time to bisect..

thaJeztah commented 8 years ago

Adding this to the milestone, so that we don't loose track

vbatts commented 8 years ago

hmmm. This seems to work OK with ed6fb41 and compiled with go1.5.3

vbatts@valse ~ (master) $ docker version
Client:
 Version:      1.11.0-dev
 API version:  1.23
 Go version:   go1.5.3
 Git commit:   ed6fb41
 Built:        Sun Mar 13 11:11:07 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0-dev
 API version:  1.23
 Go version:   go1.5.3
 Git commit:   ed6fb41
 Built:        Sun Mar 13 11:11:07 2016
 OS/Arch:      linux/amd64
vbatts@valse ~ (master) $ docker pull alpine:latest
latest: Pulling from library/alpine
4d06f2521e4f: Pull complete 
Digest: sha256:7739b19a213f3a0aa8dacbd5898c8bd467e6eaf71074296a3d75824e76257396
Status: Downloaded newer image for alpine:latest
vbatts@valse ~ (master) $ docker run -it alpine:latest sh
/ # ^D
vbatts@valse ~ (master) $ docker info
Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 1
Server Version: 1.11.0-dev
Storage Driver: btrfs
 Build Version: Btrfs v4.3.1
 Library Version: 101
Execution Driver: native-0.2
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 4.4.3-300.fc23.x86_64
Operating System: Fedora 23 (Workstation Edition)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 11.43 GiB
Name: valse.usersys.redhat.com
ID: PK4Z:YOAF:KCUV:IHEF:HXW2:QKPI:36RK:JIHI:ERMB:KPG2:LKCN:YIAL
Docker Root Dir: /home/docker/100000.100000
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 14
 Goroutines: 27
 System Time: 2016-03-13T11:17:37.568655036-04:00
 EventsListeners: 0
Username: vbatts
Registry: https://index.docker.io/v1/

But not when compiled with go1.6

vbatts@valse ~ (master) $ docker version
Client:
 Version:      1.11.0-dev
 API version:  1.23
 Go version:   go1.6
 Git commit:   ed6fb41
 Built:        Sun Mar 13 11:18:18 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0-dev
 API version:  1.23
 Go version:   go1.6
 Git commit:   ed6fb41
 Built:        Sun Mar 13 11:18:18 2016
 OS/Arch:      linux/amd64
vbatts@valse ~ (master) $ docker pull alpine:latest
latest: Pulling from library/alpine
4d06f2521e4f: Pull complete 
Digest: sha256:7739b19a213f3a0aa8dacbd5898c8bd467e6eaf71074296a3d75824e76257396
Status: Downloaded newer image for alpine:latest
vbatts@valse ~ (master) $ docker run -it alpine:latest sh
/home/vbatts/src/github.com/docker/docker/bundles/1.11.0-dev/binary/docker-1.11.0-dev: Error response from daemon: Container command could not be invoked..
cyphar commented 8 years ago

@vbatts It might be some stdlib change that's causing this issue then. Or we were depending on undefined behaviour somewhere and Go 1.6 exposes it now.

vbatts commented 8 years ago

right. To be clear, it is a docker-1.11 and go1.6 issue. Because docker-1.10.3 and go1.6 works fine.

estesp commented 8 years ago

can someone confirm that setting perms to 0711 on /var/lib/docker allows containers to start? You will need to do that chmod after Docker is started or else it will revert the perms to 0701.

cyphar commented 8 years ago

@estesp Yes, that solves this problem for me. Is that the only change needed? I've opened #21242 which changes the daemon to use 0711. I'm a little confused why this wasn't picked up by the tests.