niaid / image_portal_workflows

Workflows related to project previously referred to as "Hedwig"
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Hpc fixes #233

Closed philipmac closed 1 year ago

philipmac commented 1 year ago

Always have to use prefect.context (& not context) Allow the filesystem self throttle speed of removal, no mapping clean up

mbopfNIH commented 1 year ago

Interesting. So was the cleanup_workdirs.map version actually taking longer or hanging? Seems backwards in that using the map should allow it to be distributed, and therefore be more efficient (or at least faster).

What am I missing?

philipmac commented 1 year ago

I'm not exactly sure. The File system is a limited resource, there's only one. map will distribute, like you say. But, for operations like move / delete we're not compute bounded, we're IO bounded. Adding more compute doesn't speed up the file system. Having a single core submit the sequential move / rm operations and wait until each are completed isn't costing us anything, I believe. But should remove some variables wrt scheduling.

mbopfNIH commented 1 year ago

Makes sense. But isn't brt.cleanup_files doing a similar thing? I guess it uses map with an unmapped pattern - but it seems like we could simply let the file system handle this, too since the calls have upstream_tasks to avoid doing it too early.