sciencebox / charts

Helm Charts for ScienceBox services
GNU Affero General Public License v3.0
4 stars 6 forks source link

cvmfs: When one mount is OOM killed, all the other mounts are killed by supervisor #115

Closed krishnan-r closed 1 year ago

krishnan-r commented 1 year ago

We observed that when one mountpoint is killed (possibly by OOM), supervisor exits after killing all the other mountpoints.

Expected Behaviour: The processes that died could be restarted, without the whole container restarting, leaving other mountpoints intact

Screenshot of logs:

image

cc @etejedor

ebocchi commented 1 year ago

Yes, this is due to the fact the mountpoint becomes stale, cvmfs2 is unable to re-mount, supervisord tries to revive it (but continues to fail due to stale mount), and eventually commits suicide...

ebocchi commented 1 year ago

Let me see if https://github.com/sciencebox/charts/issues/35 can help here, without restarting the whole container

ebocchi commented 1 year ago

Fixed by