radiocosmology / alpenhorn

Alpenhorn is a service for managing an archive of scientific data.
MIT License
2 stars 1 forks source link

Rewrite 8/14: DefaultIO file deletion #151

Closed ketiltrout closed 1 year ago

ketiltrout commented 1 year ago

This is a relatively straightforward PR which mostly moves the guts of update.update_node_delete into alpenhorn.io._default_asyncs.delete_async which gets shoved into a task by node.io.delete.

I can't tell if it is for sure, but it seems weird to make an asynchronous task to delete a single file, so I've made the remaining code in update.update_node_delete group a bunch of them together and pass that to the I/O layer to delete together in the same task.

The bunch size I've chosen is 10. Don't really know if that's a good number. It's in the ballpark of the number of files that alpenhorn-1 tends to delete at a time (which, in turn, is related to the number of files it can transfer per hour).

I'm willing to entertain different values for the bunch size, including the value 1 (i.e. get rid of the bunching), but it seems like 1 or 10 or 100 are sort of the kind of numbers we should be considering here.