Open pvavra opened 4 years ago
I've written a simple procedure, which tries to achieve the above. Not really robust, but it seems to work for our specific use-case.
@bpoldrack I also have a conceptual question: Running the imports in parallel results in "interleaved" commits (each import seems to generate 3 separate commits: one for the dicoms, one for the specs, and one for the updated metadata).
Do you foresee any issues we could run into doing this? Maybe during the metadata aggregation step?
So, running several imports in parallel doesn't seem to work well.
I noticed two main issues:
studyspec.json
files for all acquisitions - this is a major issueds.save()
calls do not use path=..
Using a structure like the datalad run --explicit
call should make this work in parallel, assuming that no two imports are targeting the same dicoms folder.
To assert the latter part, it would be good to have the hirni-import-dcm
call handle the submission of jobs to condor instead of using the --pbs-runner condor
argument. This way, some basic sanity checks could be run over the whole set of imports. Then, the ds.save
could use the aforementioned path=..
argument to be sure to only add the relevant files.
commit messages are "mixed", as ds.save() calls do not use path=..
Agree. save calls - particularly in the superdataset - should do that. Otherwise it could commit intermediate states of other imports running in parallel.
Do you foresee any issues we could run into doing this? Maybe during the metadata aggregation step?
Metadata aggregation could have very similar issue as those save
calls. It should "fix itself" with the last run, but I guess it's safer to properly account for that in hirni-import-dcm
.
failed to create all studyspec.json files for all acquisitions - this is a major issue
That's interesting as I don't instantly see, where this issue is emerging from.
Generally, importing should be easier to parallelize - I agree. Once at it, addressing this should also include to allow for import of several archives into the same acquisition and support update of an already imported archive (which currently would be doable only by use of more low-level tools). Not quite sure about the condor related part yet. There might be a better way making use of https://github.com/datalad/datalad-htcondor. Need to think that through. Ideally we can come up with something that generalizes beyond condor.
If importing a multiple tarballs (of multiple subjects), it would be convenient to have a "batch mode" for calling the
hirni-import-dcm
.I guess how precisely to specify this might vary substantially between circumstances, but then a simple helper-script template might be convenient.