Closed GabyCT closed 4 years ago
@GabyCT - if you can re-create (which it feels like), are you able to enable the kata debug in the config and then check the logs for the relevant container ID, and see if any reason pops out why one seems to be 'stuck'? @sboeuf - any thoughts - seen this before? (does not ring a bell with me). Anybody got any smart ideas about what might cause this? Seems odd that it is Debian specific, but so it seems....
@grahamwhaley ,this is the error for the specific container ID
time="2019-01-15T16:36:06.435139711Z" level=debug msg="Could not retrieve anything from storage" arch=amd64 command=create container=c9e24c2ab817fefc4a433576fbb9ddf80a734837bd90a25a611fb5eff724c2e2 name=kata-runtime pid=103396 source=virtcontainers subsystem=kata_agent
time="2019-01-15T16:36:06.439823713Z" level=warning msg="fetch sandbox device failed" arch=amd64 command=create container=c9e24c2ab817fefc4a433576fbb9ddf80a734837bd90a25a611fb5eff724c2e2 error="open /run/vc/sbs/c9e24c2ab817fefc4a433576fbb9ddf80a734837bd90a25a611fb5eff724c2e2/devices.json: no such file or directory" name=kata-runtime pid=103396 sandbox=c9e24c2ab817fefc4a433576fbb9ddf80a734837bd90a25a611fb5eff724c2e2 sandboxid=c9e24c2ab817fefc4a433576fbb9ddf80a734837bd90a25a611fb5eff724c2e2 source=virtcontainers subsystem=sandbox
time="2019-01-15T16:36:06.473246026Z" level=debug arch=amd64 command=create container=c9e24c2ab817fefc4a433576fbb9ddf80a734837bd90a25a611fb5eff724c2e2 default-kernel-parameters="tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug" name=kata-runtime pid=103396 source=virtcontainers subsystem=qemu
time="2019-01-15T16:36:33.920014773Z" level=error msg="Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing" arch=amd64 command=create container=c9e24c2ab817fefc4a433576fbb9ddf80a734837bd90a25a611fb5eff724c2e2 name=kata-runtime pid=103396 source=runtime
Ooh. Here is a question (that might be answered in the above system capture btw) - what back end graph storage driver is the Debian using - maybe that is making the difference? Could be the version of the driver as well (could be old?)
@grahamwhaley , it is using overlay2
and here it is the info
$modinfo overlay
filename: /lib/modules/4.9.0-8-amd64/kernel/fs/overlayfs/overlay.ko
alias: fs-overlay
license: GPL
description: Overlay filesystem
author: Miklos Szeredi <miklos@szeredi.hu>
depends:
retpoline: Y
intree: Y
vermagic: 4.9.0-8-amd64 SMP mod_unload modversions
parm: check_copy_up:bool
parm: ovl_check_copy_up:Warn on copy-up when causing process also has a R/O fd open
Is this still an issue @GabyCT?
@jodh-intel yes it is.
@jodh-intel , actually it is also present in other tests like the oci_call_test.sh
. While doing this, we have the following error
$ docker run --rm --runtime=kata-runtime busybox true
docker: Error response from daemon: OCI runtime start failed: rpc error: code = Unavailable desc = transport is closing: unknown.
The same error like the soak
test.
Looking the kata-runtime
log, it seems that the same error is happening like the soak
test
time="2019-01-31T19:04:24.88422483Z" level=debug msg="Could not retrieve anything from storage" arch=amd64 command=create container=db72deb897d7fbd60cecde518ef4a79a36d918e08e055a44825ea626fdfa39a2 name=kata-runtime pid=125811 source=virtcontainers subsystem=kata_agent
time="2019-01-31T19:04:24.886903537Z" level=warning msg="fetch sandbox device failed" arch=amd64 command=create container=db72deb897d7fbd60cecde518ef4a79a36d918e08e055a44825ea626fdfa39a2 error="open /run/vc/sbs/db72deb897d7fbd60cecde518ef4a79a36d918e08e055a44825ea626fdfa39a2/devices.json: no such file or directory" name=kata-runtime pid=125811 sandbox=db72deb897d7fbd60cecde518ef4a79a36d918e08e055a44825ea626fdfa39a2 sandboxid=db72deb897d7fbd60cecde518ef4a79a36d918e08e055a44825ea626fdfa39a2 source=virtcontainers subsystem=sandbox
time="2019-01-31T19:04:24.927344137Z" level=debug arch=amd64 command=create container=db72deb897d7fbd60cecde518ef4a79a36d918e08e055a44825ea626fdfa39a2 default-kernel-parameters="tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug" name=kata-runtime pid=125811 source=virtcontainers subsystem=qemu
Shim log
time="2019-01-31T18:36:18.277179196Z" level=info msg="copy stdout failed" container=249e847d8d3bc3bb4c46e5e3ad30b46be580d1f4ffa030e4d97a9b833a8a71a5 error="rpc error: code = Unknown desc = EOF" exec-id=249e847d8d3bc3bb4c46e5e3ad30b46be580d1f4ffa030e4d97a9b833a8a71a5 name=kata-shim pid=1 source=shim
time="2019-01-31T18:36:18.292804355Z" level=info msg="copy stderr failed" container=249e847d8d3bc3bb4c46e5e3ad30b46be580d1f4ffa030e4d97a9b833a8a71a5 error="rpc error: code = Unknown desc = EOF" exec-id=249e847d8d3bc3bb4c46e5e3ad30b46be580d1f4ffa030e4d97a9b833a8a71a5 name=kata-shim pid=1 source=shim
might be the problem is the number of "physical" cpu
Output of "docker info":
...
Architecture: x86_64
CPUs: 2
...
can you try with 4 CPUs?
great, now I can reproduce this issue in all distros with 2 cpus
[Fail] Update CPU constraints Update CPU set [It] cpuset should be equal to 0
/home/x/go/src/github.com/kata-containers/tests/integration/docker/cpu_test.go:376
[Fail] Update CPU constraints Update CPU set [It] cpuset should be equal to 2
/home/x/go/src/github.com/kata-containers/tests/integration/docker/cpu_test.go:376
[Fail] Update CPU constraints Update CPU set [It] cpuset should be equal to 0-1
/home/x/go/src/github.com/kata-containers/tests/integration/docker/cpu_test.go:376
[Fail] Update CPU constraints Update CPU set [It] cpuset should be equal to 0-2
/home/x/go/src/github.com/kata-containers/tests/integration/docker/cpu_test.go:376
@GabyCT can we close this?
@devimc this is the error that I am getting
And we are out of memory on container 85 (2041942016 < (2*1024*1024*1024))
Checking 85 containers have all relevant components
@GabyCT thanks, @grahamwhaley is working on density stuff, may be he knows why it is failing
This isn't a density test - I think this is from the 'soak rm stability test': https://github.com/kata-containers/tests/blob/master/integration/stability/soak_parallel_rm.sh#L40, where we launch (by default) 110 containers, kill them all off, and check there are no 'fragments' left hanging around in the system that should not be (like proxies, sandbox files, mounts etc.). We do that 5 times...
It looks like Debian run failed to launch all 110 containers, which is probably either:
@grahamwhaley thanks, @GabyCT can you confirm if debian CI has enough RAM?
Closing this issue as the CI is running debian 10
While running the soak test in Debian 9, we see that some of the containers that are being launched have the
Created
status instead ofRunning
which is making thesoak test
to fail.Here it is the information of the setup and logs
Runtime config files
Runtime default config files
Runtime config file contents
Config file
/etc/kata-containers/configuration.toml
not found Output of "cat "/usr/share/defaults/kata-containers/configuration.toml"
":KSM throttler
version
Output of "
/usr/libexec/kata-ksm-throttler/kata-ksm-throttler --version
":systemd service
Output of "
systemctl show kata-ksm-throttler
":Image details
Initrd details
No initrd
Logfiles
Runtime logs
Recent runtime problems found in system journal:
Proxy logs
Recent proxy problems found in system journal:
Shim logs
Recent shim problems found in system journal:
Throttler logs
No recent throttler problems found in system journal.
Container manager details
Have
docker
Docker
Output of "
docker version
":Output of "
docker info
":Output of "
systemctl show docker
":No
kubectl
Packages
Have
dpkg
Output of "dpkg -l|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"
":Have
rpm
Output of "rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"
":