Open nt opened 6 days ago
This appears to be a very old version of gVisor (the logs say VERSION_MISSING
, but newer releases have a more detailed error message around this part of the sandbox startup process). Can you update to a newer build?
(Also, from context, it appears you are benchmarking gVisor. Please ensure to read the performance section of the Production Guide as you do this.)
runsc.log.20240930-135720.868828.boot.txt runsc.log.20240930-135720.868828.gofer.txt runsc.log.20240930-135720.868828.restore.txt
Thank for looking Etienne, here are some logs with release-20240916.0
i don't understand the logs here.
this chroot error happens when a containers starts at https://github.com/google/gvisor/blob/3971ecbc6ccd71c1b1fac08987c20d421b6f60b6/runsc/cmd/chroot.go#L122
from the boot.txt, the container starts with no issue and the application runs.
@nt The attached logs do not show the FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy
issue. The logs show that the sandbox was running and then received SIGTERM and was killed.
Anyways, @nixprime had a hypothesis of what could be going on. runsc creates a new tmpfs mount at /tmp and then creates the sandbox chroot there. This mount is re-mounted as read-only once the sandbox chroot is prepared. In between the time that we create the tmpfs mount at /tmp and it is remounted, we hypothesize that either the Golang runtime or some library opens a file descriptor within /tmp, which is not closed at the time of remount, causing it to fail with EBUSY.
Could you try patching #10975 and giving that a try?
Hi @ayushr2, thanks for looking. Unfortunately we can't upgrade gvisor as frequently as we'd like because we care about checkpoint stability. I will make sure to include that patch in our next update.
Description
Starting a sandbox can randomly fail with
FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy
Steps to reproduce
Happens ~0.8% of container start attempts
runsc version
docker version (if using docker)
No response
uname
No response
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
-> logs in comments