stfc / ral-htcondor-tools

Scripts and stuff used with HTCondor at RAL
0 stars 7 forks source link

Add check to detect locked filesystem #44

Closed jnc74743 closed 1 year ago

jnc74743 commented 1 year ago

This check detects if the /pool mountpoint is queryable within 5 seconds. If it's not the script exists with a fatal error and sets the node to unhealthly. If left unattended, this can lead to workers being unable to delete Docker containers causing a "Zombie" effect. At the time of this patch going in, the culprit is an issue with a CVMFS repo.