google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.87k stars 1.3k forks source link

TestMultiContainerEvent test failed #5875

Closed zhlhahaha closed 3 years ago

zhlhahaha commented 3 years ago

Description

When running bazel test //runsc/container:container_test --test_filter=TestMultiContainerEvent --cache_test_results=no for sever times, it always failed with following message:

--- FAIL: TestMultiContainerEvent (0.77s)
    multi_container_test.go:1921: Running containerd test-container-IJNV6LZ2XITHL3PAWVVTPG7ISLKGJVXK
    multi_container_test.go:1921: Running containerd test-container-N3S4P2NHO3HF6HZBB7MLQSRCDBRMFUHG
    multi_container_test.go:1921: Running containerd test-container-TYRWTLSFZLLSLDFBCJRDI5RNJC3JK3LG
    multi_container_test.go:1966: Running container should report nonzero CPU usage, but got 0
    multi_container_test.go:1969: Expected container test-container-IJNV6LZ2XITHL3PAWVVTPG7ISLKGJVXK to use more than 0 ns of CPU, but used 0
    multi_container_test.go:1971: Container test-container-IJNV6LZ2XITHL3PAWVVTPG7ISLKGJVXK usage: 0
    multi_container_test.go:1971: Container test-container-N3S4P2NHO3HF6HZBB7MLQSRCDBRMFUHG usage: 90000000

Steps to reproduce I use following shell script to detect error

#!/bin/bash
while true;do
bazel test //runsc/container:container_test --test_filter=TestMultiContainerEvent --cache_test_results=no --test_output=streamed > report 2>&1
fail=`cat report|grep FAIL|wc -l`
if [ $fail -ne 0 ];then
        break
fi
done

Environment

Please include the following details of your environment:

Would you like to take a look when you have time? @kevinGC

zhlhahaha commented 3 years ago

/assign @kevinGC

zhlhahaha commented 3 years ago

I did some investigation: The CPUusage path comes from following code https://github.com/google/gvisor/blob/47bc115158397024841aa3747be7558b2c317cbb/runsc/cgroup/cgroup.go#L298 As the cgroup.name and the cgroup.parents both contain user.slice which lead to the wrong CPU usage path.

fvoznika commented 3 years ago

cgroup is trying to load information about the current process and not the sandbox process. Thus it gets confuses and builds the wrong path. I'm working on a fix...