Open psychelzh opened 7 months ago
I wonder if adding an overwrite
option to DerivativesDataSink
would be useful?
Generally it seems like a bad idea to aim two parallel jobs at the same output directory. Especially when working on networked filesystems, synchronization is uncertain at best.
Happy to consider a patch, but my experience is that these are massive time sinks that are better handled by using separate output directories and merging in a single process, post-run.
I might not be thinking about this the right way, but in my case (XCP-D), we warp atlases to the standard space and resolution used across runs (and potentially subjects), so we only want one copy of the warped+resampled atlas in the derivatives. I have this done in the single-subject workflow since that's where collect_data
is called and I need the BOLD runs selected by collect_data
to identify the space and resolution to warp the atlas to, but we expect the space and resolution to be consistent within and across subjects, so the files are written out to the same location.
Does that approach make sense to you?
What happened?
Originally posted in https://github.com/PennLINC/xcp_d/issues/1064#issuecomment-1966634300
When different nodes try to access the same file simultaneously (especially when run pipelines in parallel), the file could be inaccessible (for XCP-D use case).
What command did you use?
What version of the software are you running?
XCP-D 0.6.1
How are you running this software?
Singularity
Is your data BIDS valid?
Yes
Are you reusing any previously computed results?
No
Please copy and paste any relevant log output.
Additional information / screenshots
No response