nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.78k stars 152 forks source link

Unexpected error during NotifReceive() execution #169

Closed myugan closed 3 years ago

myugan commented 3 years ago

Hi @ctalledo @rodnymolina

I have an error log like this, i'm not sure because of this our containers sometimes crash can't kill the running containers without restart the docker daemon.

WARN[2020-12-23 05:30:53] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 26 pid 22364
WARN[2020-12-23 07:28:28] sysbox-fs caught signal: terminated
WARN[2020-12-23 12:55:42] sysbox-fs caught signal: terminated
WARN[2020-12-23 16:09:01] Error decoding received nsenterMsg response: read nsenterPipe-p: connection reset by peer
WARN[2020-12-23 16:18:02] Sysbox-fs first child process error status: pid = 6723
WARN[2020-12-23 16:18:02] Sysbox-fs first child process error status: pid = 6722
WARN[2020-12-23 21:50:24] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 14 pid 29780
WARN[2020-12-24 10:35:02] Error decoding received nsenterMsg response: read nsenterPipe-p: connection reset by peer
WARN[2020-12-24 18:57:02] Error decoding received nsenterMsg response: read nsenterPipe-p: connection reset by peer
WARN[2020-12-25 14:14:44] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 30 pid 3068
WARN[2020-12-25 15:58:03] Error decoding received nsenterMsg response: read nsenterPipe-p: connection reset by peer
WARN[2020-12-25 22:36:04] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 29 pid 6920
WARN[2020-12-26 01:39:33] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 190 pid 25454
WARN[2020-12-26 09:39:34] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 156 pid 26159
WARN[2020-12-26 13:50:29] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 77 pid 17239
WARN[2020-12-26 19:15:10] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 78 pid 21232
WARN[2020-12-26 21:25:06] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 87 pid 23710
WARN[2020-12-27 00:35:03] Unexpected error during NotifReceive() execution (bad file descriptor) on fd 53 pid 6342
WARN[2020-12-27 01:48:35] Unexpected error during NotifReceive() execution (inappropriate ioctl for device) on fd 168 pid 7878
WARN[2020-12-27 07:14:40] Unexpected error during NotifReceive() execution (bad file descriptor) on fd 43 pid 30142
rodnymolina commented 3 years ago

@myugan, we fixed an issue a few weeks ago that would explain what you describe above. Can you please try with our latest code by building from sources?

myugan commented 3 years ago

Okay got it, is it possible to create deb package for the latest update instead of compiling from the source.

rodnymolina commented 3 years ago

At the moment you cannot create deb images by your own. However, we are about to start our next release cycle, so we are expecting to have the new images ready soon (one or two weeks).

rodnymolina commented 3 years ago

As explained above, problem has been already fixed in top-of-tree. Please let us know if have any other question on this matter. Will close this one now.

myugan commented 3 years ago

I want to ask, there is any possibility the docker container with dead state is caused by sysbox?

ctalledo commented 3 years ago

Hi @myugan:

I want to ask, there is any possibility the docker container with dead state is caused by sysbox?

I've never come across that. In what context are you seeing this?

myugan commented 3 years ago

I'm not sure about this but some time in our system there is a container stuck and need to restart the daemon, when I see the process with docker ps -a it turns to dead state

ctalledo commented 3 years ago

I'm not sure about this but some time in our system there is a container stuck and need to restart the daemon, when I see the process with docker ps -a it turns to dead state

Got it. The error you reported above could lead to a container getting stuck / hanged.

As @rodnymolina mentioned, this error should be fixed in Sysbox top-of-tree. Have you seen this error with the Sysbox top-of-tree?

myugan commented 3 years ago

Ah okay, so that's the case because I still use the latest update deb package in Ubuntu and still waiting a new latest update as @rodnymolina told me. Thanks in advance

myugan commented 3 years ago
image

This is related to sysbox? like i said before, the docker is stuck sometimes when i remove the container.

cc @rodnymolina @ctalledo

ctalledo commented 3 years ago

Hi @myugan:

This is related to sysbox? like i said before, the docker is stuck sometimes when i remove the container.

Need more info to help you here.

What process is that strace from?

I suspect it's the docker CLI (since it's trying to open /var/run/docker.sock). I don't see how Sysbox could have any effect on the Docker CLI communicating with the daemon.

myugan commented 3 years ago

Hi @ctalledo seems this is not coming again, you're right this issue doesn't come from sysbox itself but another configuration in docker that leads to stdout being buffered so its turn the container stuck.

ctalledo commented 3 years ago

Hi @ctalledo seems this is not coming again, you're right this issue doesn't come from sysbox itself but another configuration in docker that leads to stdout being buffered so its turn the container stuck.

Thanks @myugan for root causing and confirming that it's not a Sysbox related issue, much appreciated.

myugan commented 3 years ago

Hi @ctalledo

Now I have the same issue but this is not coming from the Docker configuration, these things only happen sometimes that causes the container in Created state for a minute then the container can be running (Up state)

Unexpected error during NotifReceive() execution (bad file descriptor) on fd 17 pid 9813
ctalledo commented 3 years ago

Thanks @myugan for the update.

Is this behavior reproducible? If so, what are the steps to reproduce it?

The Unexpected error during NotifReceive() execution warning is usually not a problem, it sometimes occurs when a process inside a container performs an system call trapped by Sysbox (e.g., mount syscall, etc) but then dies unexpectedly (usually because the container associated with that process was stopped).

Thanks as always for reporting.

myugan commented 3 years ago

Unfortunately, this is not reproducible at the moment but I will let you know once I can find a way how to get this issue consistently.