cvmfs: When one mount is OOM killed, all the other mounts are killed by supervisor

sciencebox / charts

Helm Charts for ScienceBox services

GNU Affero General Public License v3.0

4 stars 6 forks source link

cvmfs: When one mount is OOM killed, all the other mounts are killed by supervisor #115

Closed krishnan-r closed 1 year ago

krishnan-r commented 1 year ago

We observed that when one mountpoint is killed (possibly by OOM), supervisor exits after killing all the other mountpoints.

Expected Behaviour: The processes that died could be restarted, without the whole container restarting, leaving other mountpoints intact

Screenshot of logs:

cc @etejedor

ebocchi commented 1 year ago

Yes, this is due to the fact the mountpoint becomes stale, cvmfs2 is unable to re-mount, supervisord tries to revive it (but continues to fail due to stale mount), and eventually commits suicide...

ebocchi commented 1 year ago

Let me see if https://github.com/sciencebox/charts/issues/35 can help here, without restarting the whole container

ebocchi commented 1 year ago

Fixed by

https://github.com/sciencebox/charts/commit/69030a27644cdef8c4e816217e8cba5a0eaa138f
https://github.com/sciencebox/charts/commit/f4ee80f879ff2c1a09199a68a7473ffd3ed54beb
https://github.com/sciencebox/charts/commit/0879fd905d9d95955234cbfd95c3462cd8b3158b