psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

dicom2spec error due to checking for spec key "id" which does not exist #173

Open pvavra opened 3 years ago

pvavra commented 3 years ago

when importing a tarball (can share on medusa if desired), I am getting the following error:

[ERROR ] 'id' [spec_helpers.py:get_specval:33] (KeyError)

it seems to stem from the implicit call on line: https://github.com/psychoinformatics-de/datalad-hirni/blob/c87946dc6cf7f51266c80390ced0114527bd6b3d/datalad_hirni/commands/dicom2spec.py#L451

Indeed, for all the other tarballs this error did not get triggered. I assume that some of the preceding conditions of the if clause were False for those other tarballs (tested only with one other atm) and hence that line did not get executed.

However, I do not understand what that conditional is trying to achieve. Why should any of those conditions lead to the series being ignored for the conversion in the first place? The offending dicom series in this particular case is a SBRef image related to a functional run and should be converted.

Or is it related the sorting needing RF a few lines above?

As a quick workaround for this particular dataset, I can simpy comment out the whole for loop starting on line 436: https://github.com/psychoinformatics-de/datalad-hirni/blob/c87946dc6cf7f51266c80390ced0114527bd6b3d/datalad_hirni/commands/dicom2spec.py#L436-L456

But not sure I am not breaking anything further down the line by it..

bpoldrack commented 3 years ago

Hmm. Two things:

First of all, the immediate issue seems to be that the spec that is being worked over here has no id field for at least some entries. id would default to the SeriesNumber from the DICOM header fields and is used to specify order. Now, I think we've seen similar things before where there is no such entry. So, the code needs to be able to deal with that, I guess. In that sense, I consider it a bug.

Secondly, what this is trying to achieve is a guess as to what run to use when there appear to be several. If we have several series with the same description and the same previously determined run number, then this tries to check those SeriesNumber entries and mark all the apparent duplications but the last one (highest SeriesNumber) to be ignored by default. Idea is that this seems likely to be caused by aborted runs that were then redone (which is why we go for the last one). That guess might be wrong of course. In any case, you seem to have several series with the exact same Protocol field in the DICOM headers. Ultimately they need to get a specification that points to a different target in BIDS (or a choice is to be made which one to ignore for conversion). Can you tell, what's the reason for the duplication in your case, @pvavra? I wonder whether it would be helpful to be able to disable such guessing and rather go with an initially derived specification, that we already know can't be converted without further editing. Or may be there's a better way to determine what to do with such duplications.

pvavra commented 3 years ago

Re no id: I just re-ran the spec generation for all 30 acquisitions in my dataset and added a simple print(spec_series_list) on line 436, i.e. just before the whole for loop and save the output into a file. A ctrl-f later, I can confirm that not a single spec_series_list has the field id. Should it maybe simply be uid instead?

In my case tarballs, the different runs are explicitly noted in the SeriesDescription (fMRI_some_more_stuff___run1 or fMRI_..._run1_...) and my custom rules are able to pick up on that and set the run field correctly. However, they didn't get triggered, I guess, because the above error halted the conversion before that. So I guess the default heuristic was not really able to pick up on this. But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

pvavra commented 3 years ago

Note that since there is not id, this sorting should also not work: https://github.com/psychoinformatics-de/datalad-hirni/blob/c87946dc6cf7f51266c80390ced0114527bd6b3d/datalad_hirni/commands/dicom2spec.py#L433-L435

bpoldrack commented 3 years ago

Should it maybe simply be uid instead?

Well, as far as identifying an entry in a spec goes, that's an option. But implies no order in time, so not useful for what that piece of code is trying to achieve.

my custom rules are able to pick up on that and set the run field correctly.

Can I see them?

However, they didn't get triggered, I guess, because the above error halted the conversion before that. So I guess the default heuristic was not really able to pick up on this. But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

Okay, need to have a closer look how that can be. Having your custom rules might help, though.

pvavra commented 3 years ago

cannot attach a .py file.. so here it goes:

"""Custom rules for dicom2spec for RLAMF study."""
import re  # for extracting numbers from strings
# from pprint import pprint

class MyDICOM2SpecRules(object):  # pylint: disable-msg=r0205
    """Class holding rules for dicom2spec for RLAMF study."""

    def __init__(self, dicommetadata):
        """Store dicom metadata in internal var."""
        self._dicom_series = dicommetadata

    def __call__(self, subject=None, anon_subject=None, session=None):
        """Apply rules to set of all dicom series of an acquisition."""
        spec_dicts = []

        # Heuristic for guessing session based on present fMRI series
        if any(s['SeriesDescription'].endswith("run10") for s
               in self._dicom_series):
            session = "01"
        else:
            session = "02"
        # print("==========  SESSION  ================")
        # print(session)

        for dicom_dict in self._dicom_series:
            spec_dicts.append((self._rules(dicom_dict,
                                           subject=subject,
                                           anon_subject=anon_subject,
                                           session=session),
                               self.series_is_valid(dicom_dict)
                               ))
        return spec_dicts

    def _rules(self,  # pylint: disable-msg=r0201,r0912
               series_dict,
               subject=None,
               anon_subject=None,
               session=None):
        """Apply rules to a single series of dicoms."""
        task = None
        modality = None
        comment = ''
        run = None
        acquisition = None
        # T1 ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("t1"):
            modality = 'T1w'
            acquisition = 'mprage'

        # Fieldmap ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("gre_field_mapping"):
            if "M" in series_dict['ImageType']:
                # will infer magnitude1/magnitude2 automatically
                modality = 'magnitude'
            else:
                modality = 'phasediff'

        # IR EPI ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("IR-EPI"):
            # c.f. https://www.ncbi.nlm.nih.gov/pubmed/14635150
            modality = 'T1w'
            acquisition = 'irepi'

        # DTI ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("DTI") and \
           series_dict['SeriesDescription'].endswith("PA"):
            modality = 'dwi'
        if series_dict['SeriesDescription'].startswith("DTI") and \
           series_dict['SeriesDescription'].endswith("AP"):
            # the A >> P series does have any diffusion-weighting, and is
            # essentially only for the correction of distortions for the SE-EPI
            modality = 'epi'

        # fMRI ---------------------------------------------------
        # Session 1:
        label_task = 'fMRI_SMS2_2.2iso_66sl_TR2_run'
        if series_dict['SeriesDescription'].startswith(label_task):
            modality = 'bold'
            task = 'learning'
            # number afterwards indicates run number (range: 1 to 10)
            # Note: the remainder of the description does _not_ contain numbers
            tail = series_dict['SeriesDescription'][len(label_task):]
            run = re.findall(r'\d+', tail)[0]

        # Session 2:
        label_task = 'fMRI_run'
        if series_dict['SeriesDescription'].startswith(label_task):
            modality = 'bold'
            task = 'recognition'
            # only 1 to 4
            run = series_dict['SeriesDescription'][len(label_task)]

        # NB: the following should inherit the rest from the above task,
        # but overwrite the modality
        if series_dict['SeriesDescription'].endswith("SBRef"):
            modality = 'sbref'

        # T2w ---------------------------------------------------
        if series_dict['SeriesDescription'].endswith(
                                            "t2_tse_orth_hippo_448_p2"):
            modality = 'T2w'
        # Proton Density --------------------------------------------------
        if series_dict['SeriesDescription'].endswith(
                                            "PD_ax_acpc_hippo_448_p2"):
            modality = 'pd'

        # AAHead Scouts ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("AAHead_Scout"):
            comment = "Localizer - will be ignored for BIDS"

        return {'description': series_dict['SeriesDescription']
                if "SeriesDescription" in series_dict else '',
                'comment': comment,
                'subject': series_dict['PatientID']
                if not subject else subject,
                'anon-subject': anon_subject if anon_subject else None,
                'bids-session': session if session else None,
                'bids-modality': modality,
                'bids-task': task,
                'bids-run': run,
                'bids-acquisition': acquisition
                }

    def series_is_valid(self, series_dict):  # pylint: disable-msg=r0201
        """
        Set whether a series should be converted to bids at all.

        For all series which this returns `false`, the hirni-spec2bids will
        skip the conversion to niftis.
        """
        # Skip all localizers & report
        if series_dict['SeriesDescription'].startswith("AAHead_Scout") or \
           series_dict['SeriesDescription'] == "PhoenixZIPReport":
            valid = False
        else:
            valid = True

        return valid

__datalad_hirni_rules = MyDICOM2SpecRules  # pylint: disable-msg=C0103
bpoldrack commented 3 years ago

Thanks! Will look into that more closely.

bpoldrack commented 3 years ago

my custom rules are able to pick up on that and set the run field correctly. However, they didn't get triggered, I guess, because the above error halted the conversion before that

No, this piece of code is executed after the rules (which it probably shouldn't, but that's another issue). If your rules really weren't triggered, they are prob. not correctly configured. The dataset should have a config (or several in correct order) datalad.hirni.dicom2spec.rules with a value that points to the file you pasted above. So, if you do that outside of a datalad python context, that would be: git config -f .datalad/config --add datalad.hirni.dicom2spec.rules <path to your file>. Can you verify such a config exists in your dataset, @pvavra? Generally, it could also be in the dataset's .git/config or your ~/-gitconfig, of course. I would just suggest that the committed .datalad/config is best for provenance tracking. That way, the spec building is done on top of a commit, that actually contains the relevant config.

Note that since there is not id, this sorting should also not work:

This sorting "works" because it assigns a sorting key of 0 if there's no id, so doesn't change the order of snippets w/o an id and puts them before all snippets with an id.

bpoldrack commented 3 years ago

Hah, hold on. Looking for another reason why your rules might not have been executed, I noticed hirni didn't catch up on a change in datalad core. With a current datalad installed, it would indeed fail to get your rule, if there are multiple rules configured. Is that the case, @pvavra?

Note to myself: ConfigManager.get requires get_all=True!

bpoldrack commented 3 years ago

But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

Assuming that your rules weren't picked up, in my estimation it's simply the only one where default rules would derive the same run from the protocol field for several series. Could that be true?

bpoldrack commented 3 years ago

As a quick workaround for this particular dataset, I can simpy comment out the whole for loop starting on line 436: ... But not sure I am not breaking anything further down the line by it..

No, nothing really relies on it so you don't break anything. If you end up with a specification containing duplicates, however, conversion based on it might go wrong.

In any case, if you could retry with the branch of PR #174 to confirm this issue doesn't get triggered, that would be great. Apart from the already pushed changes, I'm looking into a command switch to turn that part off.

bpoldrack commented 3 years ago

@pvavra, can you confirm PR #174 works for you?

Regarding this:

Apart from the already pushed changes, I'm looking into a command switch to turn that part off.

I'll take that back. I think there should be a serious refactoring instead, that would turn that part into a rule that is used by default (and can be used in a custom setup, too, if so desired).

pvavra commented 3 years ago

I'll give the PR a try, but am not sure what the simplest way for me to test it is... Create a dedicated venv? if yes, how do I install that pull-request? I'm asking since I hope to be testing more PRs in the near future ;) So having a good workflow would be handy..

bpoldrack commented 3 years ago

@pvavra Sorry for answering a little late - just back from vacation.

I'd go for a dedicated venv, yes. You can then install the PR (or any particular branch/tag) like this: pip install git+https://github.com/bpoldrack/datalad-hirni@fix-173

bpoldrack commented 3 years ago

If you want to work on PRs yourself anyway, there's another approach that I usually use, @pvavra. I have a local clone (with the relevant forks of other contributors as remotes) and a dev venv, where my local clone is editable installed: pip install -e /local/path. That way the venv is using whatever is currently checked out at that path and I adjust the checkout I work on or want to test rather than the venv.