This adds an attempt to clean up a DefaultIONode during an idle update by:
looking for .placeholder files and deleting them
attempting to remove acq directories
Because this routine runs when the node is idle (i.e. only when there's no other I/O occurring), no placeholders should be on the node. Any which are found are clearly spurious due to prior crashes.
I've also implemented a check to reduce how often it runs. It will always run at start-up (when I suspect most uncleanliness would be found), and then once every 100 times the node transitions from not-idle to idle. Not really sure how often is appropriate. It might even be sufficient to only run it on start-up.
While implementing this, I discovered that the code that was deleting acq dirs wasn't stopping at the StorageNode.root, meaning there was a potential to delete the node directory itself (plus anything above that)!
In practice, on DefaultIO nodes, this couldn't happen because all such nodes have a ALPENHORN_NODE file at the top level, but that's not necessarily true for other IO classes which still use the DefaultIO's delete function (for example, the LustreHSM I/O class).
I've fixed this bug while moving the directory deletion code from the delete_async into its own function in ioutil because the cleanup task is now also using it.
Also, removed submitting an uncessary job which was deleting zero file copies.
This adds an attempt to clean up a DefaultIONode during an idle update by:
.placeholder
files and deleting themBecause this routine runs when the node is idle (i.e. only when there's no other I/O occurring), no placeholders should be on the node. Any which are found are clearly spurious due to prior crashes.
I've also implemented a check to reduce how often it runs. It will always run at start-up (when I suspect most uncleanliness would be found), and then once every 100 times the node transitions from not-idle to idle. Not really sure how often is appropriate. It might even be sufficient to only run it on start-up.
While implementing this, I discovered that the code that was deleting acq dirs wasn't stopping at the StorageNode.root, meaning there was a potential to delete the node directory itself (plus anything above that)!
In practice, on DefaultIO nodes, this couldn't happen because all such nodes have a
ALPENHORN_NODE
file at the top level, but that's not necessarily true for other IO classes which still use the DefaultIO's delete function (for example, the LustreHSM I/O class).I've fixed this bug while moving the directory deletion code from the delete_async into its own function in
ioutil
because the cleanup task is now also using it.Also, removed submitting an uncessary job which was deleting zero file copies.