psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

`hirni-import-dcm` performs double metadata extraction #117

Open mih opened 5 years ago

mih commented 5 years ago

Didn't investigate why or how, but this has quite some impact on any sizeable dataset.

Example command:

datalad hirni-import-dcm -d raw https://github.com/datalad/example-dicom-functional/archive/master.tar.gz acq2 --anon-subject 001
bpoldrack commented 5 years ago

Actually, this is no double aggregation. The first time meta-aggregate is called upon the new dicom subdataset itself and the second time on the superdataset with path to that subdataset and into=top. Theoretically, the second call should be relatively cheap, but it actually takes surprisingly long. Not sure what to do about it ATM. The separation is needed, in case the acquisition directory isn't given and needs to be derived from the extracted metadata. At that point the subdataset is not yet a subdataset and its location changes. That's why the second part comes later. We could limit that behavior to have to calls only when acquisition wasn't given, but else I don't see a way around this. Does anything come to mind, that I might have missed, wrt to a cheap upwards propagation, @mih?

bpoldrack commented 5 years ago

Changed hirni-import-dcm to do it that way only if actually needed. However the more important point is to be solved within metalad (Aggregation into top triggers forced extraction).