willfarrell / docker-autoheal

Monitor and restart unhealthy docker containers.
MIT License
1.31k stars 225 forks source link

segfault in ld-musl-x86_64.so.1 #127

Open KomtGoed opened 7 months ago

KomtGoed commented 7 months ago

Hi,

I am using Autoheal together with Watchtower.

Watchtower is scheduled to run at 05:00 in the morning. Since the day I added Autoheal to the herd, I get the following error messages in the logs:

Feb 24 05:00:19 server kernel: [981482.676478] sh[388687]: segfault at 7ffcb2888eb8 ip 00007f56ada70318 sp 00007ffcb2888eb0 error 6 in ld-musl-x86_64.so.1[7f56ada35000+4c000] likely on CPU 0 (core 0, socket 0)
Feb 24 05:00:19 server kernel: [981482.677822] Code: 41 5d 41 5e 41 5f c3 41 57 31 c0 b9 0a 00 00 00 41 56 41 55 49 89 fd 41 54 55 83 cd ff 53 48 81 ec 58 01 00 00 48 8d 7c 24 38 <48> 89 74 24 08 4c 8d 7c 24 38 f3 ab 48 8d 44 24 20 4d 89 f8 31 ff

This is what's happening around that time in the watchtower logs: time="2024-02-24T05:00:19+01:00" level=info msg="Stopping /autoheal (574bcbae8d52) with SIGTERM" time="2024-02-24T05:00:20+01:00" level=debug msg="Removing container 574bcbae8d52" time="2024-02-24T05:00:20+01:00" level=info msg="Creating /autoheal" time="2024-02-24T05:00:20+01:00" level=debug msg="Starting container /autoheal (b7a538093001)" time="2024-02-24T05:00:21+01:00" level=info msg="Removing image 408724c998f0"

Is it correct to assume that it's actually Autoheal causing the segfault? If so, what to do about it?

Thanks!

KomtGoed commented 7 months ago

Can reliably reproduce the segfault by stopping the autoheal container through the portainer UI.

Did some searching and came across the links included below. https://github.com/lovell/sharp/issues/2570 Is Alpine up to date in the latest Autoheal container?

https://serverfault.com/questions/1097705/segmentation-fault-of-bacula-on-apline-linux Is MUSL up to date?

https://users.rust-lang.org/t/segfault-when-linking-against-shared-libraries-on-alpine/98629 Is Rust compiled/linked as mentioned?

Or can it be any other setting of my docker environment?

Thanks for any help!

rahmanrafi commented 6 months ago

Ran into this issue as well (Ubuntu 22.04.4 LTS/Linux 5.15.0-101-generic x86_64).

It seems to be caused line 128 in docker-entrypoint. My guess is it's a race condition caused by kill $$; term_handler which kills the process then tries to call term_handler(); if you modify this to kill the background process instead (kill $!; term handler), stopping the container shouldn't produce any segfault errors in syslog, and I can confirm this is the case on my machine.

KomtGoed commented 6 months ago

Nicely spotted, thanks!