Sporadic "mount through procfd: no such file or directory: unknown" when dind is used

nudgegoonies commented 3 years ago

We use latest master build of sysbox:

sysbox-fs: f3ecad054e319287c9f30a13a976c05ac078226f
sysbox-mgr: 26c90a66f1084fe973d5eba4c9eb93700a3d67e8
sysbox-runc: df952e5276cb6e705e0be331e9a9fe88f372eab8

Since then we get sporadic errors on gitlab-ci runners that use a private docker-dind 20.10.7 when starting a container which fails with this error message:

OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "proc" to rootfs at "/proc" caused: mount through procfd: no such file or directory: unknown

I cannot find corresponding log entries in sysbox-mgr or sysbox-fs. But seldom the following can be found in syslog-fs log:

level=warning msg="Sysbox-fs first child process error status: exit status 1, pid: 22961"
level=info msg="reaper: reaped pid 23032"
level=info msg="reaper: nothing to reap"
level=warning msg="TOCTOU check failed on fd 14573 pid 8511 cntr 656f2d89d27e: req.Id is no longer valid (no such file or directory)"
level=warning msg="Unexpected error during NotifRespond() execution (no such file or directory) on fd 14573 pid 5807"

Since the new sysbox version we also find lots of these in the sysbox-fs log (ca. 30 per day):

level=warning msg="TOCTOU check failed on fd 14629 pid 8227 cntr 00ed95b824ab: req.Id is no longer valid (no such file or directory)"
level=warning msg="Unexpected error during NotifRespond() execution (no such file or directory) on fd 14629 pid 810"

ctalledo commented 3 years ago

Hi @nudgegoonies, thanks again for filing the issue.

we get sporadic errors on gitlab-ci runners that use a private docker-dind 20.10.7 mounting "proc" to rootfs at "/proc" caused: mount through procfd: no such file or directory: unknown

Strange, this issue was fixed in Sysbox top-of-tree more than a month ago (https://github.com/nestybox/sysbox/issues/291).

If possible, please let us know how to reproduce.

The logs in sysbox-fs are likely not the cause of the problem you report, but will keep them in mind as we investigate. These log messages often occur whenever sysbox is intercepting a container's syscall but the container process dies before the syscall can complete.

nudgegoonies commented 3 years ago

Did not happen anymore over the last 4 days. I also generated heavy docker build and run load on a runner. No reboot was done. Could it be that the daily docker system prune cron fixed it?

ctalledo commented 3 years ago

Did not happen anymore over the last 4 days. I also generated heavy docker build and run load on a runner. No reboot was done. Could it be that the daily docker system prune cron fixed it?

Thanks for the update. I doubt the docker system prune would have fixed it, since the problem of "mount through procfd" occurs everytime the inner Docker launches an inner container; I don's see how pruning images would make a difference on this.

Anyway, if you are not seeing it, feel free to close it and we can re-open if you spot it again. Thanks!

nudgegoonies commented 3 years ago

Thanks for the answer. Once per day via cron the gitlab runner stops accepting jobs and when the last job finishes the gitlab-runner service restarts and as part of it the dind. Maybe this restart fixed it? That brings me to the question on how to handle sysbox package updates. Does outer docker daemon, where sysbox is configured, needs to be restarted as well along with the running dind?

ctalledo commented 3 years ago

Hi @nudgegoonies,

That brings me to the question on how to handle sysbox package updates. Does outer docker daemon, where sysbox is configured, needs to be restarted as well along with the running dind?

That's a good question, and something we need to add to our docs.

To update Sysbox, you need to:

1) Stop all containers using Sysbox 2) Uninstall Sysbox 3) Install the new version of Sysbox 4) Restart the containers using Sysbox

In general, the outer Docker daemon does not need to be restarted (unless the Sysbox installer in step 3 detects that it needs to update the /etc/docker/daemon.json file, in which case it will ask for permission to restart Docker; this normally should not occur because the initial installation of Sysbox would have configured that file already).

Hope that helps, let me know if you have any other questions please.

nudgegoonies commented 3 years ago

Thank you very much for the answer @ctalledo What i don't understand is the uninstallation. Why not just upgrade the package?

ctalledo commented 3 years ago

What i don't understand is the uninstallation. Why not just upgrade the package?

That's certainly something we can look into. Copying @rodnymolina since he is more familiar with the sysbox package life-cycle than I am.

But even if we go with the package upgrade (instead of uninstall and re-install), Sysbox containers would still need to be stopped before the upgrade procedure.

rodnymolina commented 3 years ago

That's right @nudgegoonies, there's room for improvement here. We will add this to our feature's todo-list. In the meantime, if you need something for now to avoid disrupting your current servers during the upgrade, please contact us over Slack so that we can discuss alternative approaches.

ctalledo commented 3 years ago

Closing this issue for now since the "mount through procfd" issue is fixed (though it's not clear why @nudgegoonies hit it and why it later went away). Please re-open if you think otherwise.

To avoid polluting this issue, the side discussion on sysbox package life-cycle can be dealt through slack or via a separate issue.

nestybox / sysbox

Sporadic "mount through procfd: no such file or directory: unknown" when dind is used #338