google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.64k stars 1.29k forks source link

FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy #10965

Open nt opened 6 days ago

nt commented 6 days ago

Description

Starting a sandbox can randomly fail with FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy

Steps to reproduce

Happens ~0.8% of container start attempts

runsc version

20240916.0

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

-> logs in comments

EtiennePerot commented 6 days ago

This appears to be a very old version of gVisor (the logs say VERSION_MISSING, but newer releases have a more detailed error message around this part of the sandbox startup process). Can you update to a newer build?

(Also, from context, it appears you are benchmarking gVisor. Please ensure to read the performance section of the Production Guide as you do this.)

nt commented 3 days ago

runsc.log.20240930-135720.868828.boot.txt runsc.log.20240930-135720.868828.gofer.txt runsc.log.20240930-135720.868828.restore.txt

Thank for looking Etienne, here are some logs with release-20240916.0

milantracy commented 3 days ago

i don't understand the logs here.

this chroot error happens when a containers starts at https://github.com/google/gvisor/blob/3971ecbc6ccd71c1b1fac08987c20d421b6f60b6/runsc/cmd/chroot.go#L122

from the boot.txt, the container starts with no issue and the application runs.

ayushr2 commented 3 days ago

@nt The attached logs do not show the FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy issue. The logs show that the sandbox was running and then received SIGTERM and was killed.

Anyways, @nixprime had a hypothesis of what could be going on. runsc creates a new tmpfs mount at /tmp and then creates the sandbox chroot there. This mount is re-mounted as read-only once the sandbox chroot is prepared. In between the time that we create the tmpfs mount at /tmp and it is remounted, we hypothesize that either the Golang runtime or some library opens a file descriptor within /tmp, which is not closed at the time of remount, causing it to fail with EBUSY.

Could you try patching #10975 and giving that a try?

nt commented 1 day ago

Hi @ayushr2, thanks for looking. Unfortunately we can't upgrade gvisor as frequently as we'd like because we care about checkpoint stability. I will make sure to include that patch in our next update.