google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.94k stars 2.31k forks source link

e2e flake: "Container command not found or does not exist.." #1257

Open timstclair opened 8 years ago

timstclair commented 8 years ago
F0502 15:40:21.498049   21957 runner.go:290] Error 0: error on host e2e-cadvisor-coreos-beta-docker19: command "godep" ["go" "test" "--timeout" "15m0s" "github.com/google/cadvisor/integration/tests/..." "--host" "e2e-cadvisor-coreos-beta-docker19" "--port" "8080" "--ssh-options" "-i /var/lib/jenkins/gce_keys/google_compute_engine -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o CheckHostIP=no -o StrictHostKeyChecking=no"] failed with error: exit status 1 and output: godep: WARNING: Go version (go1.6) & $GO15VENDOREXPERIMENT= wants to enable the vendor experiment, but disabling because a Godep workspace (Godeps/_workspace) exists
--- FAIL: TestDockerContainerSpec (1.27s)
    framework.go:338: Failed to run "sudo" [docker run -d --cpu-shares 2048 --cpuset-cpus 0 --memory 1073741824 --env TEST_VAR=FOO --label bar=baz kubernetes/pause] in "e2e-cadvisor-coreos-beta-docker19" with error: "exit status 127". Stdout: "7fbe2ac936f4037efab0585328b7473a7e86d315f91ee83ded6d967bce3f1b22\n", Stderr: Warning: Permanently added 'e2e-cadvisor-coreos-beta-docker19' (ECDSA) to the list of known hosts.
        docker: Error response from daemon: Container command not found or does not exist..
FAIL
FAIL    github.com/google/cadvisor/integration/tests/api    38.419s
ok      github.com/google/cadvisor/integration/tests/healthz    0.021s
godep: go exit status 1

Sample builds:

timstclair commented 8 years ago

This is happening very frequently, and if I ssh into the host and issue that command, I can reproduce it pretty reliably (roughly 1 of every 2 or 3 tries fails):

stclair@e2e-cadvisor-coreos-beta ~ $ sudo docker run -d --cpu-shares 2048 --cpuset-cpus 0 --memory 1073741824 --env TEST_VAR=FOO --label bar=baz kubernetes/pause
baef8231b979c04e03bb44d21bb85a5a85369771de97eb57034239f88f3a5524
docker: Error response from daemon: Container command not found or does not exist..

I observed that removing the --memory 1073741824 flag seems to prevent the problem (I couldn't reproduce without that flag).

Setting the memory limit to the minimum reliably gives this error:

stclair@e2e-cadvisor-coreos-beta ~ $ sudo docker run -d --cpu-shares 2048 --cpuset-cpus 0 --memory 4194304 --env TEST_VAR=FOO --label bar=baz kubernetes/pause
6d65e0c7d078990e78440b93944a7c668ac2f2fca649e043624ae19da50fd61a
docker: Error response from daemon: Cannot start container 6d65e0c7d078990e78440b93944a7c668ac2f2fca649e043624ae19da50fd61a: [9] System error: write parent: broken pipe.

And adding 1 reproduces the Container command not found:

stclair@e2e-cadvisor-coreos-beta ~ $ sudo docker run -d --cpu-shares 2048 --cpuset-cpus 0 --memory 4194305 --env TEST_VAR=FOO --label bar=baz kubernetes/pause
ae7897e3f5ef100582abf512f08c2b7992ab66f7da5c4c399b2c89899f34055e
docker: Error response from daemon: Container command not found or does not exist..

This looks like a bug in the docker version running.

stclair@e2e-cadvisor-coreos-beta ~ $ sudo docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   8acee1b
 Built:        
 OS/Arch:      linux/amd64
Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   8acee1b
 Built:        
 OS/Arch:      linux/amd64

@Random-Liu did you see anything like this when you were testing Docker v1.10?

/cc @yifan-gu @sjpotter

timstclair commented 8 years ago

More details from the docker logs:

$ sudo journalctl -u docker --no-pager
...
May 23 18:56:15 e2e-cadvisor-coreos-beta.c.kubernetes-jenkins.internal dockerd[1335]: time="2016-05-23T18:56:15.323147623Z" level=warning msg="signal: killed"
May 23 18:56:15 e2e-cadvisor-coreos-beta.c.kubernetes-jenkins.internal dockerd[1335]: time="2016-05-23T18:56:15.459091218Z" level=error msg="error locating sandbox id f8ce9a4850614d5bdec61b7d9ba824d52a90df94ea3178d4e7ed72f51f38c23f: sandbox f8ce9a4850614d5bdec61b7d9ba824d52a90df94ea3178d4e7ed72f51f38c23f not found"
May 23 18:56:15 e2e-cadvisor-coreos-beta.c.kubernetes-jenkins.internal dockerd[1335]: time="2016-05-23T18:56:15.459677653Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/9c2855c6d168e5662682d9d2eb858944d1d366010d1ac9667c86ceb8a7209b7b/shm: invalid argument"
May 23 18:56:15 e2e-cadvisor-coreos-beta.c.kubernetes-jenkins.internal dockerd[1335]: time="2016-05-23T18:56:15.459971658Z" level=error msg="Error unmounting container 9c2855c6d168e5662682d9d2eb858944d1d366010d1ac9667c86ceb8a7209b7b: not mounted"
May 23 18:56:15 e2e-cadvisor-coreos-beta.c.kubernetes-jenkins.internal dockerd[1335]: time="2016-05-23T18:56:15.460397447Z" level=error msg="Handler for POST /v1.22/containers/9c2855c6d168e5662682d9d2eb858944d1d366010d1ac9667c86ceb8a7209b7b/start returned error: Container command not found or does not exist."