jsheunis / multi-echo-super

0 stars 0 forks source link

Where should corrections to the datasets be done? #4

Open tsalo opened 1 year ago

tsalo commented 1 year ago

For example, the T1w images in ds000210 are 4D and need to be reduced to 3D before fMRIPrep can be run on them. Should the modified dataset be published to G-Node GIN and then added to the super-dataset from there instead of using the one directly from OpenNeuro?

jsheunis commented 1 year ago

I think these should rather be seen as different states of the same dataset (i.e. same git repo), which has multiple siblings (i.e. git remotes).

The original dataset is cloned as a subdataset into the multi-echo superdataset (here into the raw subdirectory). Then some modifications have to be made via code(in this case 4D to 3D), which results in a new commit (after datalad save) of both the subdataset and the superdataset (the latter being the subdataset version bump). At this stage a GIN sibling can be added for the dataset (which was originally cloned from its github sibling) and the updated state can be pushed there. There is always the possibility to set clone candidate priorities of subdatasets (see: https://handbook.datalad.org/en/latest/beyond_basics/101-148-clonepriority.html), e.g. if we always want people to automatically clone from GIN when obtaining this dataset through the multi-echo superdataset.

It's always good to keep provenance, so I think running the 4D to 3D conversion code via datalad run command is a good idea. I didn't do this when I created the derivatives subdatasets from their file manifests, but it should be easy to redo.