nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.78k stars 152 forks source link

Sometimes sysbox stopped #118

Closed myugan closed 3 years ago

myugan commented 3 years ago

I've installed sysbox from the source (compiled), but sometimes the runtime is stopped how can i managed those to automatically run it again if sysbox stopped? Then it seems scr/sysbox script doesn't do that, also i've suggestion to add suffix for log filename like sysbox-mgr-14-11-2020.log

Thanks

rodnymolina commented 3 years ago

@myugan, a few points below ...

myugan commented 3 years ago

Sorry there is a mistake, i've installed packages version for production but it sometimes the service stopped maybe this is unexpected behavior i don't know what's an error causes sysbox stopped.

rodnymolina commented 3 years ago

Ok, you installed Sysbox from our package (not from sources). A few questions below:

myugan commented 3 years ago

sysbox-mgr.log

...
WARN[2020-11-15 01:07:02] container id: 4c1ce32e02efbc32d565e4941ae61c6ae81116630da94c3b4edf2d387182f121
WARN[2020-11-15 01:07:02] container id: 6e6b91f6c6664c8e43e8b48eec8ee7765a9463712ce12c02d872e29291ad536c
WARN[2020-11-15 01:07:02] container id: 88aa06cd9815eaab95c21061e6f8f118052b1b2543560b16bd48eeeedac2b484
WARN[2020-11-15 01:07:02] container id: f38cbe3c6e11c6788d84a6476a847bd74b15b30f4b748423ddae386c87806548
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to sync-out volumes for container a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5: volume sync-out failed: failed to sync /var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5/ to /var/lib/docker/overlay2/f60657e9bb772f7ba94606e89164b2ee1abbfde38da64c6c7d69afd37a117775/merged/var/lib/docker: rsync: change_dir "/var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
 exit status 23
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to destroy volumes for container a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5: failed to stat /var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5: stat /var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5: no such file or directory
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to sync-out volumes for container bdfd261ea574bac5765a1181c7809f6f5ff8def94d37a4006e0fdffa5489c514: volume sync-out failed: failed to sync /var/lib/sysbox/docker/bdfd261ea574bac5765a1181c7809f6f5ff8def94d37a4006e0fdffa5489c514/ to /var/lib/docker/overlay2/08208f815b2cef36dee454b83774c16ac01af687c254d6f1644984a5e66962ea/merged/var/lib/docker: rsync: change_dir "/var/lib/sysbox/docker/bdfd261ea574bac5765a1181c7809f6f5ff8def94d37a4006e0fdffa5489c514" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
 exit status 23
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to destroy volumes for container bdfd261ea574bac5765a1181c7809f6f5ff8def94d37a4006e0fdffa5489c514: failed to stat /var/lib/sysbox/docker/bdfd261ea574bac5765a1181c7809f6f5ff8def94d37a4006e0fdffa5489c514: stat /var/lib/sysbox/docker/bdfd261ea574bac5765a1181c7809f6f5ff8def94d37a4006e0fdffa5489c514: no such file or directory
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to destroy volumes for container 298f90cbc2993add50cb8745077b8e5dc3a50d28ecc1110c18b3d38246732209: failed to stat /var/lib/sysbox/docker/298f90cbc2993add50cb8745077b8e5dc3a50d28ecc1110c18b3d38246732209: stat /var/lib/sysbox/docker/298f90cbc2993add50cb8745077b8e5dc3a50d28ecc1110c18b3d38246732209: no such file or directory
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to destroy volumes for container 1d2dbe01564d39045ad2c82d180e4e1d2489204855d3d1a008a846313c2d4dd6: failed to stat /var/lib/sysbox/docker/1d2dbe01564d39045ad2c82d180e4e1d2489204855d3d1a008a846313c2d4dd6: stat /var/lib/sysbox/docker/1d2dbe01564d39045ad2c82d180e4e1d2489204855d3d1a008a846313c2d4dd6: no such file or directory
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to destroy volumes for container 6e6787a7a859128ac0b04ae2209778e9c9b1bfc4a2c7e8b26dcf619f613c7827: failed to stat /var/lib/sysbox/docker/6e6787a7a859128ac0b04ae2209778e9c9b1bfc4a2c7e8b26dcf619f613c7827: stat /var/lib/sysbox/docker/6e6787a7a859128ac0b04ae2209778e9c9b1bfc4a2c7e8b26dcf619f613c7827: no such file or directory
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to destroy volumes for container 5af1f37a7574b233ba3217006a43bbac669a2afed202a888728eddc7beae37be: failed to stat /var/lib/sysbox/docker/5af1f37a7574b233ba3217006a43bbac669a2afed202a888728eddc7beae37be: stat /var/lib/sysbox/docker/5af1f37a7574b233ba3217006a43bbac669a2afed202a888728eddc7beae37be: no such file or directory
WARN[2020-11-15 01:07:02] dockerVolMgr: failed to destroy volumes for container ea93c9be6d6da006a3ae348378df3a5b88fcf392a3368b89f109c8be201534ea: failed to stat /var/lib/sysbox/docker/ea93c9be6d6da006a3ae348378df3a5b88fcf392a3368b89f109c8be201534ea: stat /var/lib/sysbox/docker/ea93c9be6d6da006a3ae348378df3a5b88fcf392a3368b89f109c8be201534ea: no such file or directory
...
...
...

For sysbox-fs.log i didn't get information with same time like previous log [2020-11-15 01:07:02].

INFO[2020-11-15 01:07:03] Initiating sysbox-fs engine ...
INFO[2020-11-15 01:07:03] Initiating sysbox-fs engine ...
INFO[2020-11-15 01:07:12] Container pre-registration message received for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:07:12] Container pre-registration successfully completed for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:07:13] Container registration message received for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:07:13]
                 id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
                 initPid: 8117
                 ctime: 0001-01-01 00:00:00 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:07:13] Container registration successfully completed for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:07:13] Container update message received for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:07:13]
                 id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
                 initPid: 8117
                 ctime: 2020-11-15 01:07:13.716455801 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:07:13] Container update successfully processed for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:07:14] Container pre-registration message received for id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
INFO[2020-11-15 01:07:14] Container pre-registration successfully completed for id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
INFO[2020-11-15 01:07:15] Container registration message received for id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
INFO[2020-11-15 01:07:15]
                 id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
                 initPid: 10442
                 ctime: 0001-01-01 00:00:00 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:07:15] Container registration successfully completed for id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
INFO[2020-11-15 01:07:15] Container unregistration message received for id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
INFO[2020-11-15 01:07:15]
                 id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
                 initPid: 10442
                 ctime: 0001-01-01 00:00:00 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:07:15] Container unregistration successfully completed for id: c27ebe65b1968d4d159dd640e0c4e05a770d8c28d126eef9ea5b9d591ea6f2d5
INFO[2020-11-15 01:11:33] Container unregistration message received for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:11:33]
                 id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
                 initPid: 8117
                 ctime: 2020-11-15 01:07:13.716455801 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:11:33] Container unregistration successfully completed for id: 1fc775c7188f9a40f58079c648a8acea133134282415b8f319ed4fc447165783
INFO[2020-11-15 01:11:37] Container pre-registration message received for id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
INFO[2020-11-15 01:11:37] Container pre-registration successfully completed for id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
INFO[2020-11-15 01:11:37] Container registration message received for id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
INFO[2020-11-15 01:11:37]
                 id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
                 initPid: 13796
                 ctime: 0001-01-01 00:00:00 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:11:37] Container registration successfully completed for id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
INFO[2020-11-15 01:11:38] Container update message received for id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
INFO[2020-11-15 01:11:38]
                 id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
                 initPid: 13796
                 ctime: 2020-11-15 01:11:38.722810607 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:11:38] Container update successfully processed for id: 38586ca53906289beac4ec891b5ba625afd5628cbf94aab0fadeb90d28f5d616
INFO[2020-11-15 01:11:39] Container pre-registration message received for id: 715f3b3026b09b5fa7e89c4d4b30a7252527fce6591b7bdcad788e97b7b981a4
INFO[2020-11-15 01:11:40] Container pre-registration successfully completed for id: 715f3b3026b09b5fa7e89c4d4b30a7252527fce6591b7bdcad788e97b7b981a4
INFO[2020-11-15 01:11:40] Container registration message received for id: 715f3b3026b09b5fa7e89c4d4b30a7252527fce6591b7bdcad788e97b7b981a4
INFO[2020-11-15 01:11:40]
                 id: 715f3b3026b09b5fa7e89c4d4b30a7252527fce6591b7bdcad788e97b7b981a4
                 initPid: 15475
                 ctime: 0001-01-01 00:00:00 +0000 UTC
                 UID: 362144
                 GID: 362144
INFO[2020-11-15 01:11:40] Container registration successfully completed for id: 715f3b3026b09b5fa7e89c4d4b30a7252527fce6591b7bdcad788e97b7b981a4
INFO[2020-11-15 01:11:40] Container unregistration message received for id: 715f3b3026b09b5fa7e89c4d4b30a7252527fce6591b7bdcad788e97b7b981a4
INFO[2020-11-15 01:11:40]
                 id: 715f3b3026b09b5fa7e89c4d4b30a7252527fce6591b7bdcad788e97b7b981a4
                 initPid: 15475
                 ctime: 0001-01-01 00:00:00 +0000 UTC
                 UID: 362144
                 GID: 362144
...
...
...
ctalledo commented 3 years ago

Hi @myugan,

but it sometimes the service stopped maybe this is unexpected behavior i don't know what's an error causes sysbox stopped.

Yes, it's definitely unexpected behavior.

From the sysbox logs, looks like sysbox-mgr is in a bad state. The following log message means that during a container stop, sysbox-mgr failed to move some data from /var/lib/sysbox to the container's rootfs:

WARN[2020-11-15 01:07:02] dockerVolMgr: failed to sync-out volumes for container a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5: volume sync-out failed: failed to sync /var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5/ to /var/lib/docker/overlay2/f60657e9bb772f7ba94606e89164b2ee1abbfde38da64c6c7d69afd37a117775/merged/var/lib/docker: rsync: change_dir "/var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
 exit status 23

I am not sure why the failure occurred, our tests are not catching such a failure. If you have a way to reproduce it consistently, let us know so we can repro and debug it. Also, If you have the full sysbox-mgr log, please attach it as that may help.

To make sysbox recover from this error, stop all docker containers that use the sysbox runtime (e.g., you can use this script and then execute systemctl restart sysbox). This should reset sysbox into a clean state (i.e., after restart, the sysbox-mgr and sysbox-fs logs should not show any errors).

myugan commented 3 years ago

So in this case what i think is maybe there is a script to handle automatically when sysbox-mgr got an error like this it will restarted to clean state.

FYI I'm not attach full log because the other things have same output and didn't provide any error except bad state.

Thankyou

ctalledo commented 3 years ago

So in this case what i think is maybe there is a script to handle automatically when sysbox-mgr got an error like this it will restarted to clean state.

No, that would not be wise, because there may be other containers running that are not affected by the failure and which we don't want to disturb. Thus, restarting sysbox-mgr automatically when we hit such an error could make things worse.

The solution is to avoid sysbox-mgr hitting such errors (i.e., debugging and fixing the problem).

Is there a way you can reproduce this consistently that we can try on our end?

rodnymolina commented 3 years ago

Agree with Cesar here. @myugan, let us know if you are able to reproduce this one again, specially if you find out how to do it consistently. Thanks!

myugan commented 3 years ago

Sure @rodnymolina and @ctalledo thanks for your response, let me check it with create another resources to reproduce this issue consistently and will let you know.

myugan commented 3 years ago

I think still get this issue, is it problem if i modify scr/sysbox script to do check every 1 minutes if sysbox seems failed like this then it will restart automatically by script itself.

FYI currently i didn't run any container with kubernetes stuff, only run a lot of container and sometime this issue appear.

WARN[2020-11-21 06:35:00] dockerVolMgr: failed to sync-out volumes for container c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: volume sync-out failed: failed to sync /var/lib/sysbox/docker/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d/ to /var/lib/docker/overlay2/203d8d30a04c6082ffc9decc1af0da93e74fafb3ffd40e2a9995bc405f3b8220/merged/var/lib/docker: rsync: change_dir "/var/lib/sysbox/docker/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
 exit status 23
WARN[2020-11-21 06:35:00] dockerVolMgr: failed to destroy volumes for container c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: failed to stat /var/lib/sysbox/docker/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: stat /var/lib/sysbox/docker/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: no such file or directory
WARN[2020-11-21 06:35:00] dockerVolMgr: failed to destroy volumes for container e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: failed to stat /var/lib/sysbox/docker/e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: stat /var/lib/sysbox/docker/e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: no such file or directory
WARN[2020-11-21 06:35:00] dockerVolMgr: failed to destroy volumes for container 3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: failed to stat /var/lib/sysbox/docker/3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: stat /var/lib/sysbox/docker/3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: no such file or directory
WARN[2020-11-21 06:35:00] dockerVolMgr: failed to destroy volumes for container 24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: failed to stat /var/lib/sysbox/docker/24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: stat /var/lib/sysbox/docker/24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: no such file or directory
WARN[2020-11-21 06:35:00] dockerVolMgr: failed to destroy volumes for container 4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: failed to stat /var/lib/sysbox/docker/4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: stat /var/lib/sysbox/docker/4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: no such file or directory
WARN[2020-11-21 06:35:00] dockerVolMgr: failed to destroy volumes for container 295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: failed to stat /var/lib/sysbox/docker/295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: stat /var/lib/sysbox/docker/295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: no such file or directory
WARN[2020-11-21 06:35:00] dockerVolMgr: failed to destroy volumes for container 862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: failed to stat /var/lib/sysbox/docker/862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: stat /var/lib/sysbox/docker/862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container 862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: failed to stat /var/lib/sysbox/kubelet/862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: stat /var/lib/sysbox/kubelet/862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container 295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: failed to stat /var/lib/sysbox/kubelet/295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: stat /var/lib/sysbox/kubelet/295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: failed to stat /var/lib/sysbox/kubelet/e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: stat /var/lib/sysbox/kubelet/e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container 24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: failed to stat /var/lib/sysbox/kubelet/24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: stat /var/lib/sysbox/kubelet/24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container 2aeecf608e2fe0d09660b194ddd465b520aeb1ed523882b34eb02060b6a7f2b9: failed to stat /var/lib/sysbox/kubelet/2aeecf608e2fe0d09660b194ddd465b520aeb1ed523882b34eb02060b6a7f2b9: stat /var/lib/sysbox/kubelet/2aeecf608e2fe0d09660b194ddd465b520aeb1ed523882b34eb02060b6a7f2b9: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container fa761da78995d0691c4efab9d835f30b1fd443d1449bb5d20085e9909377ac66: failed to stat /var/lib/sysbox/kubelet/fa761da78995d0691c4efab9d835f30b1fd443d1449bb5d20085e9909377ac66: stat /var/lib/sysbox/kubelet/fa761da78995d0691c4efab9d835f30b1fd443d1449bb5d20085e9909377ac66: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to sync-out volumes for container c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: volume sync-out failed: failed to sync /var/lib/sysbox/kubelet/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d/ to /var/lib/docker/overlay2/203d8d30a04c6082ffc9decc1af0da93e74fafb3ffd40e2a9995bc405f3b8220/merged/var/lib/kubelet: rsync: change_dir "/var/lib/sysbox/kubelet/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
 exit status 23
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: failed to stat /var/lib/sysbox/kubelet/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: stat /var/lib/sysbox/kubelet/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container 4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: failed to stat /var/lib/sysbox/kubelet/4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: stat /var/lib/sysbox/kubelet/4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container 3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: failed to stat /var/lib/sysbox/kubelet/3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: stat /var/lib/sysbox/kubelet/3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: no such file or directory
WARN[2020-11-21 06:35:00] kubeletVolMgr: failed to destroy volumes for container 3db976e3ca53761a000b5180c595b7a42ffe333b647b3d3fde803ebe4c562b32: failed to stat /var/lib/sysbox/kubelet/3db976e3ca53761a000b5180c595b7a42ffe333b647b3d3fde803ebe4c562b32: stat /var/lib/sysbox/kubelet/3db976e3ca53761a000b5180c595b7a42ffe333b647b3d3fde803ebe4c562b32: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container 3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: failed to stat /var/lib/sysbox/containerd/3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: stat /var/lib/sysbox/containerd/3baf8a61d13774bd50561cd8994fa0f4fdded15bd83729538add8879c67aad9e: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container 295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: failed to stat /var/lib/sysbox/containerd/295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: stat /var/lib/sysbox/containerd/295cf335edad3a4b3f956fe6bc99cf2c38df233a68a649bb2e098f690c6248d9: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container 24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: failed to stat /var/lib/sysbox/containerd/24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: stat /var/lib/sysbox/containerd/24efddb91303cb0231580eeea46b5a28dba2737bc9546dfdbe1f1c2db83bfb1d: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container 3db976e3ca53761a000b5180c595b7a42ffe333b647b3d3fde803ebe4c562b32: failed to stat /var/lib/sysbox/containerd/3db976e3ca53761a000b5180c595b7a42ffe333b647b3d3fde803ebe4c562b32: stat /var/lib/sysbox/containerd/3db976e3ca53761a000b5180c595b7a42ffe333b647b3d3fde803ebe4c562b32: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container fa761da78995d0691c4efab9d835f30b1fd443d1449bb5d20085e9909377ac66: failed to stat /var/lib/sysbox/containerd/fa761da78995d0691c4efab9d835f30b1fd443d1449bb5d20085e9909377ac66: stat /var/lib/sysbox/containerd/fa761da78995d0691c4efab9d835f30b1fd443d1449bb5d20085e9909377ac66: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container 4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: failed to stat /var/lib/sysbox/containerd/4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: stat /var/lib/sysbox/containerd/4002099a96d3eb9fbd59a7e7963f3f59202834fbfd7409dd5d5d42000775f898: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container 2aeecf608e2fe0d09660b194ddd465b520aeb1ed523882b34eb02060b6a7f2b9: failed to stat /var/lib/sysbox/containerd/2aeecf608e2fe0d09660b194ddd465b520aeb1ed523882b34eb02060b6a7f2b9: stat /var/lib/sysbox/containerd/2aeecf608e2fe0d09660b194ddd465b520aeb1ed523882b34eb02060b6a7f2b9: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container 862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: failed to stat /var/lib/sysbox/containerd/862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: stat /var/lib/sysbox/containerd/862b7559c4a05442230358283892e38bccea66620e846db0465a66521f4b35bd: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to sync-out volumes for container c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: volume sync-out failed: failed to sync /var/lib/sysbox/containerd/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d/ to /var/lib/docker/overlay2/203d8d30a04c6082ffc9decc1af0da93e74fafb3ffd40e2a9995bc405f3b8220/merged/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs: rsync: change_dir "/var/lib/sysbox/containerd/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
 exit status 23
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: failed to stat /var/lib/sysbox/containerd/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: stat /var/lib/sysbox/containerd/c95d6852ca58084a815471bc24e865329653a70ac211fc484030590d0ed9cb9d: no such file or directory
WARN[2020-11-21 06:35:00] containerdVolMgr: failed to destroy volumes for container e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: failed to stat /var/lib/sysbox/containerd/e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: stat /var/lib/sysbox/containerd/e5be408169fd5ebf6eb607106bae8647f45a4fc28a170c082daec9028a9f565a: no such file or directory
INFO[2020-11-21 06:35:01] Stopped.
INFO[2020-11-21 06:35:01] Exiting.
INFO[2020-11-21 06:35:01] Sys container DNS aliasing enabled.
INFO[2020-11-21 06:35:01] Listening on /run/sysbox/sysmgr.sock
INFO[2020-11-21 06:35:01] Ready ...
INFO[2020-11-21 06:35:02] Starting ...
INFO[2020-11-21 06:35:02] Sys container DNS aliasing enabled.
INFO[2020-11-21 06:35:02] Listening on /run/sysbox/sysmgr.sock
INFO[2020-11-21 06:35:02] Ready ...
INFO[2020-11-21 06:35:06] registered new container 63e1e98a1a9b5bb0d00a3e5923b4fcb5fed835be95421ec14621999836c94450
INFO[2020-11-21 06:35:09] registered new container 10744e090180a093399cfcc68c4bb4ca16237db667a5db053d884ab34efa6435
ctalledo commented 3 years ago

Hi @myugan, thanks again for reporting. I'll take a close look this week to see what's going on here.

A couple of questions to help me repro:

1) How big is your machine (CPUs & Mem).

2) How many containers are you deploying at any given time?

3) Are you running Docker inside those containers? (e.g., the inner Docker is pulling images, deploying inner containers, etc.)

Thanks again for reporting, much appreciated!

myugan commented 3 years ago
  1. 128Gb and 32CPUs.
  2. more than 20 at the moment.
  3. Yes some containers run docker inside docker.
ctalledo commented 3 years ago
  1. 128Gb and 32CPUs.
  2. more than 20 at the moment.
  3. Yes some containers run docker inside docker.

Thanks @myugan . Will take a look at this ASAP.

ctalledo commented 3 years ago

Hi @myugan,

Taking a closer look at the problem you are reporting, I see it's related to Sysbox Enterprise (rather than Sysbox). FYI, Sysbox Enterprise is offered with a 30-day free-trial, so do contact us if you wish to purchase a license for it.

Now, looking at the sysbox-mgr logs, the following warning occur when Sysbox was stopped (via a kernel SIGHUP, SIGINT, SIGTERM, or SIGQUIT signal) and it's trying to cleanup its state but failed to do so:

WARN[2020-11-15 01:07:02] dockerVolMgr: failed to sync-out volumes for container a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5: volume sync-out failed: failed to sync /var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5/ to /var/lib/docker/overlay2/f60657e9bb772f7ba94606e89164b2ee1abbfde38da64c6c7d69afd37a117775/merged/var/lib/docker: rsync: change_dir "/var/lib/sysbox/docker/a29bd1b86368089f7901d03acab8297307da18c6cc825473db4f67d2e3f66bc5" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1196) [sender=3.1.2]
 exit status 23

Just to double-check, you must be seeing something like this in the sysbox-mgr logs too (correct?):

INFO[2020-11-27 05:30:51] Stopping (gracefully) ...    

I wonder why Sysbox was stopped though. Do the systemd logs in the host show anything related to this?

myugan commented 3 years ago

Thanks for your information @ctalledo, i didn't see Stopping (gracefully) ... just got Stopping only in my logs.

ctalledo commented 3 years ago

Hi @myugan,

Thanks for your information @ctalledo, i didn't see Stopping (gracefully) ... just got Stopping only in my logs.

If you have that portion of the log, please post it.

In any case, the "Stopping" log message tells me something send a signal to Sysbox to stop (i.e., it was not a Sysbox error that caused it to stop, but rather an external agent that send a signal to it).

Do the systemd logs showing anything related to this? That may give us a hint as to what agent in the system sent a signal to Sysbox to stop.

myugan commented 3 years ago

Hi @ctalledo @rodnymolina since I created another one machine that installed sysbox with deb package, it looks fine doesn't see any error like this. I will close this issue right now and will open it again once the issue has come again. Thanks for your help

ctalledo commented 3 years ago

Hi @ctalledo @rodnymolina since I created another one machine that installed sysbox with deb package, it looks fine doesn't see any error like this. I will close this issue right now and will open it again once the issue has come again. Thanks for your help

Thanks @myugan, we appreciate you following up. As I mentioned, we suspect that some entity is killing the sysbox process in that machine where you reported the failure. It does not appear to be a problem in sysbox per-se.

Given that you are not seeing this in this new machine, I agree that we can close and reopen if the issue shows up again.

Thanks!