Closed philipmac closed 1 year ago
Interesting. So was the cleanup_workdirs.map
version actually taking longer or hanging? Seems backwards in that using the map should allow it to be distributed, and therefore be more efficient (or at least faster).
What am I missing?
I'm not exactly sure. The File system is a limited resource, there's only one. map
will distribute, like you say. But, for operations like move / delete we're not compute bounded, we're IO bounded. Adding more compute doesn't speed up the file system. Having a single core submit the sequential move / rm operations and wait until each are completed isn't costing us anything, I believe. But should remove some variables wrt scheduling.
Makes sense. But isn't brt.cleanup_files
doing a similar thing? I guess it uses map
with an unmapped
pattern - but it seems like we could simply let the file system handle this, too since the calls have upstream_tasks to avoid doing it too early.
Always have to use prefect.context (& not context) Allow the filesystem self throttle speed of removal, no mapping clean up