kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
28.74k stars 4.81k forks source link

runc sometimes fails with "no such file or directory" #17976

Open prezha opened 5 months ago

prezha commented 5 months ago

What Happened?

we see these in some kvm+containerd tests:

example 1:

    I0116 02:30:24.180449  339661 ssh_runner.go:195] Run: sudo runc --root /run/containerd/runc/k8s.io list -f json
...
    stderr:
    time="2024-01-16T02:30:24Z" level=error msg="stat /run/containerd/runc/k8s.io/4744173521ad7687144254eea07ff8aff4223eee041b7cec0b7bda79fc00a1d0: no such file or directory"

example 2:

    I0116 01:58:56.995758  567664 ssh_runner.go:195] Run: sudo runc --root /run/containerd/runc/k8s.io list -f json
...
    stderr:
    time="2024-01-16T01:58:57Z" level=error msg="stat /run/containerd/runc/k8s.io/7010e3c493ae1d4af36adeaef2828f0f9da1b15cdf9f78f705d7c137e0446ba3: no such file or directory"

we use runc v1.1.10, and once released, some future version could possibly solve this issue:

(our) runc v1.1.10: https://github.com/opencontainers/runc/blob/18a0cb0f32bcac2ecc9a10f327d282759c144dab/list.go#L131-L134

            st, err := os.Stat(filepath.Join(absRoot, item.Name()))
            if err != nil {
                fatal(err)
            }

(latest) runc v1.1.11 - unchanged: https://github.com/opencontainers/runc/blob/4bccb38cc9cf198d52bebf2b3a90cd14e7af8c06/list.go#L131-L134

            st, err := os.Stat(filepath.Join(absRoot, item.Name()))
            if err != nil {
                fatal(err)
            }

(current) runc HEAD: https://github.com/opencontainers/runc/blob/0c5a73535503216a8c15a86aa9022d3b0d995994/list.go#L129-L136

        st, err := item.Info()
        if err != nil {
            if errors.Is(err, os.ErrNotExist) {
                // Possible race with runc delete.
                continue
            }
            return nil, err
        }

so, it's due to a Possible race with runc delete

interestingly, this change was merged in pr 3379 by @kolyshkin on 2022-01-27, but it looks like it was never released (see the current "runc v1.1.11" above, released on 2024-01-02)

we could simply retry until this gets fixed/released upstream, and then we update to that runc version

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kolyshkin commented 2 months ago

runc 1.1 backport: https://github.com/opencontainers/runc/pull/4231 (will be released in 1.1.13 if/when we do another 1.1.x release)

prezha commented 2 months ago

/remove-lifecycle stale