psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

Converting two subjects/acquisitions fails on the second one #180

Closed manuelakuhn closed 3 years ago

manuelakuhn commented 3 years ago

When converting data for two subjects, the second run of spec2bids always fails for the data I try to convert. The error is:

INFO: PROCESSING DONE: {'subject': '002', 'outdir': '/tmp/hirni-tests/bids/.git/hirni-tmp-v1pmrhm_/', 'session': None}
[INFO   ] == Command exit (modification check follows) =====
Traceback (most recent call last):
  File "/home/nela/.virtualenvs/data-pipeline/lib64/python3.8/site-packages/datalad/support/annexrepo.py", line 1703, in get_file_key
    entries = list(self.call_annex_items_(
  File "/home/nela/.virtualenvs/data-pipeline/lib64/python3.8/site-packages/datalad/support/annexrepo.py", line 1255, in call_annex_items_
    out = self._call_annex(
  File "/home/nela/.virtualenvs/data-pipeline/lib64/python3.8/site-packages/datalad/support/annexrepo.py", line 975, in _call_annex
    return runner.run_on_filelist_chunks(
  File "/home/nela/.virtualenvs/data-pipeline/lib64/python3.8/site-packages/datalad/cmd.py", line 659, in run_on_filelist_chunks
    res = self.run(
  File "/home/nela/.virtualenvs/data-pipeline/lib64/python3.8/site-packages/datalad/cmd.py", line 409, in run
    raise CommandError(
datalad.support.exceptions.CommandError: CommandError: 'git annex lookupkey -c annex.dotfiles=true -c annex.retry=3 -- sub-001/func/sub-001_task-negativesearch_run-01_events.tsv' failed with exitcode 1 under /tmp/hirni-tests/bids

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/hirni-tests/bids/sourcedata/code/hirni-toolbox/procedures/hirni-dicom-converter.py", line 158, in <module>
    remove_paths = [p for p in glob.glob('*/*/*_events.tsv')
  File "/tmp/hirni-tests/bids/sourcedata/code/hirni-toolbox/procedures/hirni-dicom-converter.py", line 159, in <listcomp>
    if dataset.repo.get_file_key(p) ==
  File "/home/nela/.virtualenvs/data-pipeline/lib64/python3.8/site-packages/datalad/support/gitrepo.py", line 324, in _wrap_normalize_paths
    result = func(self, files_new, *args, **kwargs)
  File "/home/nela/.virtualenvs/data-pipeline/lib64/python3.8/site-packages/datalad/support/annexrepo.py", line 1720, in get_file_key
    raise FileNotInAnnexError(cmd=cmd_str,
datalad.support.exceptions.FileNotInAnnexError: [Errno None] git annex lookupkey sub-001/func/sub-001_task-negativesearch_run-01_events.tsv: File not in annex: sub-001/func/sub-001_task-negativesearch_run-01_events.tsv: 'sub-001/func/sub-001_task-negativesearch_run-01_events.tsv'
[INFO   ] == Command exit (modification check follows) =====
CommandError: 'python /tmp/hirni-tests/bids/sourcedata/code/hirni-toolbox/procedures/hirni-dicom-converter.py /tmp/hirni-tests/bids sourcedata/acq2/studyspec.json True 002 sourcedata/acq2/dicoms' failed with exitcode 1 under /tmp/hirni-tests/bids

what I also see when running git status, is that a file sub-001/func/sub-001_task-negativesearch_run-01_events.tsv was created in the sub-002 run but not committed to the dataset.

Impotant note: this only triggers if a rule is registered. For simplicity I confirmed this by using the rule template file

What I do in principle

when I switch the execution order and convert acq2 first the same error occurs when converting acq1, only that the problem is now with file sub-002/func/sub-002_task-negativesearch_run-01_events.tsv

I am using datalad version 0.14.0 and datalad-hirni version 0.0.8

bpoldrack commented 3 years ago

Thanks for reporting, @manuelakuhn!

While it's not clear the error is a consequence of this, the first thing that I see is, that your rule setting doesn't apply. This is because you didn't commit the change to .datalad/config and the additional file code/custom_rules/custom_rules.py. So, when installing the subdataset via datalad install --dataset . --source ../sourcedata sourcedata --recursive this change is not included. After your modification of .datalad/config you'll need a datalad save. Note, that you probably want to track that rule file in git, not git-annex. You can do that explicitly with the save using --to-git: datalad save .datalad/config code/custom_rules/custom_rules.py --to-git -m "Configure custom rules". Or (probably more suitable) have .gitattributes specify that you want everything underneath code/custom_rules directly in git by adding a line code/custom_rules/** annex.largefiles=nothing. Then a call to datalad save will respect that.

As for the actual error: Looking into it.

manuelakuhn commented 3 years ago

Yes, sorry you are right, I forgot to add that to the minimal example above. In my conversion I am actually committing it. And thanks for the hint with the .gitattributes!

bpoldrack commented 3 years ago

And thanks for the hint with the .gitattributes!

While at it, you may want to have a look here: http://handbook.datalad.org/en/latest/basics/101-123-config2.html You can also base such settings on filesize and mimetype.

bpoldrack commented 3 years ago

Note to myself: Wondering where the call to get_file_key comes from, I think this is a a problem with the heudiconv-converter in the toolbox. The call ultimately is meant to remove empty template files generated by heudiconv. This should be more robust. However, that doesn't explain yet, why it seems to be the second run only that triggers the problem. So, there's more to it.

manuelakuhn commented 3 years ago

I digged a little deeper and found the following:

It seems that heudiconv is creating "empty" events.tsv files for all subjects (that is at least how I understand this line: https://github.com/nipy/heudiconv/blob/6b80704e297fff859504a5df455a7f7cac88aa81/heudiconv/bids.py#L118

Thus in the run for sub-002 (as second run) a events.tsv file for sub-001 is created. The outputs parameter for the container call in heudiconv-procedures.py covers the current dataset. In my case:

outputs = ['/tmp/hirni-test/bids/sub-002', '/tmp/hirni-test/bids/participants.tsv', '/tmp/hirni-test/bids/participants.json']

That means all files that heudiconv creates concerning other subjects are not committed to the dataset. This I can verify by running git status. I am not sure if this is really a problem of hirni, but it should at least be able to handle these issues. The error is only triggered in the second run, because in the first one a events.tsv is created for this one run only -> covered by outputs -> cleanup works

I came up with a workaround by separating tracked and untracked files and act accordingly. This works for me so far. Should I open a pull request?

bpoldrack commented 3 years ago

Generally, yes - you're on the right track there.

I am not sure if this is really a problem of hirni, but it should at least be able to handle these issues.

Not hirni itself it seems, but the toolbox and its dicom converter procedure, that needs to properly account for what heudiconv is doing.

I'm not quite sure though, how this can be true. While you are right with how containers-run is called, heudiconv should (by means of that call) see the dicoms for that one acquisition only it is called with. So, why would it generate a file for a subject that isn't in those dicoms? So far I wasn't able to reproduce. This suggests to me, that your spec somehow ends up with several subjects in one acquisition.

Can I see your acq1/studyspec.json + acq2/studyspec.json?

manuelakuhn commented 3 years ago

Here you go:

acq1/studyspec.json:

{"anon-subject":{"approved":false,"value":"001"},"bids-session":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"52feb173-e512-47fb-9a88-8b966c9dc7b4","dataset-refcommit":"78caf6a230830cf5f714f0e3527f284075be8153","location":"dicoms","procedures":[{"on-anonymize":{"approved":false,"value":false},"procedure-call":{"approved":false,"value":null},"procedure-name":{"approved":false,"value":"hirni-dicom-converter"}}],"subject":{"approved":false,"value":"<subject_id>"},"type":"dicomseries:all"}
{"anon-subject":{"approved":false,"value":"001"},"bids-modality":{"approved":false,"value":"magnitude"},"bids-run":{"approved":false,"value":null},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"52feb173-e512-47fb-9a88-8b966c9dc7b4","dataset-refcommit":"78caf6a230830cf5f714f0e3527f284075be8153","description":{"approved":false,"value":"gre_field_mapping"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020411055878131841133.0.0.0"}
{"anon-subject":{"approved":false,"value":"001"},"bids-modality":{"approved":false,"value":"phasediff"},"bids-run":{"approved":false,"value":null},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"52feb173-e512-47fb-9a88-8b966c9dc7b4","dataset-refcommit":"78caf6a230830cf5f714f0e3527f284075be8153","description":{"approved":false,"value":"gre_field_mapping"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020411055878133741134.0.0.0"}
{"anon-subject":{"approved":false,"value":"001"},"bids-modality":{"approved":false,"value":"T1w"},"bids-run":{"approved":false,"value":null},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"52feb173-e512-47fb-9a88-8b966c9dc7b4","dataset-refcommit":"78caf6a230830cf5f714f0e3527f284075be8153","description":{"approved":false,"value":"t1_mpr_sag_1iso_p2"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020410115790062818531.0.0.0"}
{"anon-subject":{"approved":false,"value":"001"},"bids-modality":{"approved":false,"value":"bold"},"bids-run":{"approved":false,"value":"01"},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":"negativesearch"},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"52feb173-e512-47fb-9a88-8b966c9dc7b4","dataset-refcommit":"78caf6a230830cf5f714f0e3527f284075be8153","description":{"approved":false,"value":"fMRI_SMS2_2iso_64sl_TR2_run1"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020410183466177719895.0.0.0"}

acq2/studyspec.json:

{"anon-subject":{"approved":false,"value":"002"},"bids-session":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"a3c9cd2d-5e6d-4c5c-81cc-b14228922de2","dataset-refcommit":"de47d49c1e194b581e0b8ace95704e1dc520cd4c","location":"dicoms","procedures":[{"on-anonymize":{"approved":false,"value":false},"procedure-call":{"approved":false,"value":null},"procedure-name":{"approved":false,"value":"hirni-dicom-converter"}}],"subject":{"approved":false,"value":"<subject_id>"},"type":"dicomseries:all"}
{"anon-subject":{"approved":false,"value":"002"},"bids-modality":{"approved":false,"value":"magnitude"},"bids-run":{"approved":false,"value":null},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"a3c9cd2d-5e6d-4c5c-81cc-b14228922de2","dataset-refcommit":"de47d49c1e194b581e0b8ace95704e1dc520cd4c","description":{"approved":false,"value":"gre_field_mapping"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020411055878131841133.0.0.0"}
{"anon-subject":{"approved":false,"value":"002"},"bids-modality":{"approved":false,"value":"phasediff"},"bids-run":{"approved":false,"value":null},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"a3c9cd2d-5e6d-4c5c-81cc-b14228922de2","dataset-refcommit":"de47d49c1e194b581e0b8ace95704e1dc520cd4c","description":{"approved":false,"value":"gre_field_mapping"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020411055878133741134.0.0.0"}
{"anon-subject":{"approved":false,"value":"002"},"bids-modality":{"approved":false,"value":"T1w"},"bids-run":{"approved":false,"value":null},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":null},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"a3c9cd2d-5e6d-4c5c-81cc-b14228922de2","dataset-refcommit":"de47d49c1e194b581e0b8ace95704e1dc520cd4c","description":{"approved":false,"value":"t1_mpr_sag_1iso_p2"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020410115790062818531.0.0.0"}
{"anon-subject":{"approved":false,"value":"002"},"bids-modality":{"approved":false,"value":"bold"},"bids-run":{"approved":false,"value":"01"},"bids-session":{"approved":false,"value":null},"bids-task":{"approved":false,"value":"negativesearch"},"comment":{"approved":false,"value":"I actually have no clue"},"dataset-id":"a3c9cd2d-5e6d-4c5c-81cc-b14228922de2","dataset-refcommit":"de47d49c1e194b581e0b8ace95704e1dc520cd4c","description":{"approved":false,"value":"fMRI_SMS2_2iso_64sl_TR2_run1"},"location":"dicoms","subject":{"approved":false,"value":"<subject_id>"},"tags":[],"type":"dicomseries","uid":"1.3.12.2.1107.5.2.43.66017.2020020410183466177719895.0.0.0"}

Since it is not my data I have removed the subject id.

bpoldrack commented 3 years ago

Strange indeed. The call to heudiconv from datalad hirni-spec2bids --anonymize sourcedata/acq2/studyspec.json should get to see sourcedata/acq2/dicoms only and the correct spec with one subject identifier. So, it boils down to the question: How does heudiconv come to the conclusion to create a sub-001/func/sub-001_task-negativesearch_run-01_events.tsv when it shouldn't have any indication of a sub-001 to exist?

While crashing on a non-annexed file is something to be fixed within the conversion procedure, the real issue here is the creation of that file. I'll keep digging.

bpoldrack commented 3 years ago

Noticed only now, @manuelakuhn:

Do those archives contain the same DICOMs? In the specs the uids of the image series are identical in both files. If that's the reason, it's still a bug, but that may be a lead ...

manuelakuhn commented 3 years ago

Yes, in principle they are. Just some directories renamed.

I don't think this is related to this at all, but to give you a full picture: I also have to apply this patch to be able to convert this data: psychoinformatics-de/hirni-toolbox/pull/11

Maybe as explanation: This is my test and developing dataset I was given by a colleague. I am not concerned about the dataset content for now but some workflows so I am just reusing this and pretend that it was done for different subjects/acquisitions/sessions.

bpoldrack commented 3 years ago

Okay, I got it. It is just heudiconv indeed, that does a little more than the (quite old) procedure anticipated.

I'll merge your PR as it is a valid workaround. "Real" solution on the toolbox end should probably involve a little more eventually (there may be uncommitted task_<task-name>.json files at the top level for example).

One last remark:

Impotant note: this only triggers if a rule is registered.

This appears to be just a side effect of having a spec that would ultimately lead to events files being created/not created. If the resulting spec wouldn't specify a task, heudiconv wouldn't create a corresponding events template. So, if the default rules can't derive that from your DICOMs, it would look like setting a custom rule causes the effect. There may be more conditions influencing heudiconv's behavior with respect to template creation, of course. But for later reference: Whether the rule leading to a spec is custom or not doesn't matter.

bpoldrack commented 3 years ago

Oh, and @manuelakuhn :

I also have to apply this patch to be able to convert this data: psychoinformatics-de/hirni-toolbox/pull/11

Merged this and two similar PRs as well.