psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

metadata extraction step fails on reimport of acquisition after manual delete #181

Open pvavra opened 3 years ago

pvavra commented 3 years ago

The following fails on the second import:

datalad hirni-import-dcm ... some.tar acq_name # works
chmod -R +w acq_name
rm -r acq_name
datalad save -m "remove acq_name"
datalad hirni-import-dcm ... some.tar acq_name # fails

The failure happens at the metadata aggregation step, specifically at the git ls-tree some_checksum -z -r --full-tree -l step with code 128: fatal: not a tree object. The metadata aggregation progress bar is at 0% at that moment.

The above workflow happened when I imported the data with the default rules and wanted to re-import them with my then-created custom rules. However, this could also happen when I would incorrectly specify acq_name (e.g. typo) and wanting to reimport the data into the correct path (e.g. importing sub 1 incorrectly as sub 2 and then later having to import the true sub2).

My hunch is that this is related to the datalad install part of the import leaving some checksum behind, which no longer matches the newly imported checksum. Two questions:

  1. what would be the "correct" way of doing this?
  2. shouldn't the above work anyway?

Note that when I use git rebase HEAD~10 to manually remove all commits which mention acq_name -- in addition to deleting the files with the above lines -- the second import does work.

However, tweaking the git history should not be required, I think. Not sure, but maybe this would boil down to the datalad save command having to datalad uninstall in this case.. Alternatively, at the step of the metadata extraction, there could be a check whether the path still matches the checksum (or whatever it is that got changed on the reimport..). But I'm not sure where the above workflow introduces an error..

bpoldrack commented 3 years ago
  1. what would be the "correct" way of doing this?

Don't re-import, but use hirni-dicom2spec to derive a new spec with your new rules. There's no need to do everything else again (including extracting and aggregating metadata). All you want in that case is a new spec.

  1. shouldn't the above work anyway?

Yes. Despite the usecase being somewhat invalid, I agree - there may be other reasons to do that and we should be able to deal with that. The problem may well be entirely within the metadata layer. I'll look into that.

bpoldrack commented 3 years ago

However, this could also happen when I would incorrectly specify acq_name (e.g. typo) and wanting to reimport the data into the correct path (e.g. importing sub 1 incorrectly as sub 2 and then later having to import the true sub2).

This also shouldn't require a (potentially expensive) re-import. Moving files/directories/subdatasets should be much cheaper. However, it would require a new metadata aggregation for the superdataset's metadata to remain correct/consistent. But everything else including the specfile doesn't need to be redone in that case.