Closed myugan closed 3 years ago
@myugan, at first glance it looks like sysbox-mgr is not able to allocate enough file-descriptors (fds) in your machine. Now, that's interesting because i'm not expecting sysbox-mgr to be a heavy consumer of fds. Next time you reproduce the problem, please get the following info to help us understand how many fds is sysbox-mgr (and sysbox-fs) consuming:
$ sudo lsof -p $(pgrep sysbox-mgr) | wc -l
14
$ sudo lsof -p $(pgrep sysbox-fs) | wc -l
16
Also, please get your current fds limits by doing:
$ sudo cat /proc/self/limits | egrep open
Max open files 1024 1048576 files
$ sudo cat /proc/sys/fs/file-max
9223372036854775807
Another user (on the Slack channel) recently reported a similar problem. He was able to get around it by increasing the system limits on number of open files:
"DefaultLimitNOFILE=524288:524288" >> /etc/systemd/system.conf; systemctl daemon-reload; service sysbox restart
I've not tried this yet but you can use it as a temporary workaround while we root cause it.
As Rodny mentioned, we don't expect sysbox-mgr to open too many files, except possibly when starting a Sysbox container that has lots of inner Docker images in it. In that case it needs to copy some data around, and that can result in many files opened.
But I'll definitely take look to ensure we don't have a file leak somewhere in there.
How many sysbox containers did you have running at the time this occurred?
Thanks as always for bringing this up to our attention.
Can we also add LimitNOFILE=infinity
to sysbox-mgr systemd files?
I got sugesstion from this https://github.com/systemd/systemd/issues/4997
Can we also add
LimitNOFILE=infinity
to sysbox-mgr systemd files?
I've not tried, but that should be fine as a temporary work-around. I certainly need to root cause it so that I can place the appropriate limit in the sysbox-mgr systemd unit file.
@myugan, when possible please collect the info I previously requested. It would also help if you could share details such as the number of sysbox containers running in parallel, and let us know if you are launching docker images that contain inner images.
This should help us understand if we have a file-descriptor leak and, if so, narrow down the area where it may be.
@rodnymolina
$ sudo lsof -p $(pgrep sysbox-mgr) | wc -l
809
$ sudo lsof -p $(pgrep sysbox-fs) | wc -l
30
$ cat /proc/self/limits | egrep open
Max open files 1024 4096 files
$ cat /proc/sys/fs/file-max
500000
@myugan, thanks. If you obtained those numbers during steady-state (i.e. you were not launching/stopping a sys container right when you got them), it looks like we may have a fd-leak in sysbox-mgr. We will look into this.
I looked at this problem last week, and could not reproduce it: with the latest sysbox-mgr (i.e., top-of-tree on the sysbox repo), I don't see any file descriptor leak. In fact, after deploying dozens of system containers on my host, lsof
shows < 30 opened file descriptor for the sysbox-mgr (as expected). I also reviewed the code and do not see any file descriptor leak.
@myugan: do you happen to know what's the git commit for the sysbox-mgr on which you saw the problem? Thanks!
Hi @myugan , I was finally able to reproduce the problem you reported. Looks like we need to correct the sysbox file limit for both sysbox-mgr and sysbox-fs. They can open a large amount of files at certain times, in particular when many containers are running or during container start/stop.
We will be fixing this in the upcoming release. In the meantime, you can get around it by adding the following to the systemd unit files for both sysbox-mgr (/lib/systemd/system/sysbox-mgr.service
) and sysbox-fs (/lib/systemd/system/sysbox-fs.service
).
[Service]
...
LimitNOFILE=infinity
LimitNPROC=infinity
This will remove the default limit for open-files on sysbox-mgr and sysbox-fs. It's hard to set a strict limit because the number of open files is a function of the number of containers that are running, how many inner container images they have, how many processes within those containers access resources emulated by sysbox-fs, etc.
Note that Docker takes a similar approach its systemd service unit file.
This is fixed in top-of-tree, and will be present in the upcoming Sysbox release.
I'm update my sysbox to latest code with compiling from the source and got the this error when the Docker API wanna create a container then the status does not become Running
This case sometimes happen and i need to restart
sysbox-mgr
to resolved the issue, is there any way to fix this one?Thanks