nipy / heudiconv

Flexible DICOM conversion into structured directory layouts
https://heudiconv.readthedocs.io
Other
234 stars 125 forks source link

Add type annotations #656

Closed jwodder closed 1 year ago

jwodder commented 1 year ago

Closes #653.


Problems encountered so far with applying type annotations:

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 84.89% and project coverage change: +0.39 :tada:

Comparison is base (502bf49) 81.48% compared to head (c03a5ae) 81.87%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #656 +/- ## ========================================== + Coverage 81.48% 81.87% +0.39% ========================================== Files 41 41 Lines 3899 4116 +217 ========================================== + Hits 3177 3370 +193 - Misses 722 746 +24 ``` | [Impacted Files](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy) | Coverage Δ | | |---|---|---| | [heudiconv/info.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2luZm8ucHk=) | `100.00% <ø> (ø)` | | | [heudiconv/heuristics/example.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2hldXJpc3RpY3MvZXhhbXBsZS5weQ==) | `7.69% <9.25%> (+3.46%)` | :arrow_up: | | [heudiconv/tests/anonymize\_script.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L3Rlc3RzL2Fub255bWl6ZV9zY3JpcHQucHk=) | `42.85% <28.57%> (-11.69%)` | :arrow_down: | | [heudiconv/heuristics/uc\_bids.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2hldXJpc3RpY3MvdWNfYmlkcy5weQ==) | `15.62% <33.33%> (+8.72%)` | :arrow_up: | | [heudiconv/heuristics/studyforrest\_phase2.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2hldXJpc3RpY3Mvc3R1ZHlmb3JyZXN0X3BoYXNlMi5weQ==) | `23.07% <45.45%> (+10.03%)` | :arrow_up: | | [heudiconv/cli/monitor.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2NsaS9tb25pdG9yLnB5) | `34.40% <50.00%> (+2.54%)` | :arrow_up: | | [heudiconv/heuristics/multires\_7Tbold.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2hldXJpc3RpY3MvbXVsdGlyZXNfN1Rib2xkLnB5) | `21.73% <53.33%> (+7.45%)` | :arrow_up: | | [heudiconv/tests/test\_monitor.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L3Rlc3RzL3Rlc3RfbW9uaXRvci5weQ==) | `46.87% <56.09%> (+4.19%)` | :arrow_up: | | [heudiconv/heuristics/bids\_with\_ses.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2hldXJpc3RpY3MvYmlkc193aXRoX3Nlcy5weQ==) | `11.90% <71.42%> (+6.77%)` | :arrow_up: | | [heudiconv/convert.py](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy#diff-aGV1ZGljb252L2NvbnZlcnQucHk=) | `85.53% <78.57%> (-1.76%)` | :arrow_down: | | ... and [29 more](https://codecov.io/gh/nipy/heudiconv/pull/656?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipy) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

yarikoptic commented 1 year ago

I just merged a small PR which introduced conflicts, please resolve.

jwodder commented 1 year ago

@nipy/team-heudiconv I am currently facing the following blockers to applying type annotations:


Explanation for why ses appears to be of type str | int | None: prep_conversion() is only called at one point, here, where the ses argument is set to session, which is last assigned to here and here.

Therefore, the session variable passed as the ses argument to prep_conversion() will be either str | int | None or str (i.e., just str | int | None), and so ses in prep_conversion() has the same types.

jwodder commented 1 year ago

@nipy/team-heudiconv Also, side question: In this code:

https://github.com/nipy/heudiconv/blob/2c6d228516dda41cdc9f91e7bea51b3acdb0f5b7/heudiconv/tests/test_dicoms.py#L30-L33

is dcm.pub supposed to be getattr(dcm, pub)? That would seem to make more sense.

jwodder commented 1 year ago

@nipy/team-heudiconv Yet another issue: In the following code:

https://github.com/nipy/heudiconv/blob/2c6d228516dda41cdc9f91e7bea51b3acdb0f5b7/heudiconv/bids.py#L711-L717

the header.get_best_affine() and header.get_data_shape() methods are not present on all possible return types of nibabel.load(); as far as I can tell, they are only present when nibabel.load() returns an instance of one of the following types:

Exactly what type is nb_load(nifti_file) expected to return here?

jwodder commented 1 year ago

@nipy/team-heudiconv This should hopefully be the last typing issue I post: The prov_file argument to heudiconv.dicoms.embed_metadata_from_dicoms() can apparently only be str, yet in this call to the function, the provided prov_file argument can be either a str or None. What should be done about this?

jwodder commented 1 year ago

@nipy/team-heudiconv Ping.

yarikoptic commented 1 year ago

First of all thank you @jwodder for going on this types quest!

  • I believe I may have found a bug, and I'm not sure how to address it. I've determined that the ses argument to heudiconv.convert.prep_conversion() is of type str | int | None (See explanation below), yet here ses is passed to sanitize_label(), which only accepts strs and will error if given an int. What should be done about this?

ses is in general a "label", not an "index" (see entity table for more info on those) so should be a str. So I guess the point at which you figured it could come out as an int should format it to str one way or another.

So as you deduced -- we should just just tune up that single place where it is used as `int`, so may be that is the initial diff/the point where used as `int`: ```diff diff --git a/heudiconv/parser.py b/heudiconv/parser.py index 354b160..7dd73f1 100644 --- a/heudiconv/parser.py +++ b/heudiconv/parser.py @@ -105,7 +105,7 @@ def get_extracted_dicoms(fl: Iterable[str]) -> ItemsView[Optional[int], list[str the unarchived file. If there are multiple archived files, they are grouped into separate sessions. """ - sessions: dict[Optional[int], list[str]] = defaultdict(list) + sessions: dict[Optional[str], list[str]] = defaultdict(list) # keep track of session manually to ensure that the variable is bound # when it is used after the loop (e.g., consider situation with @@ -136,14 +136,14 @@ def get_extracted_dicoms(fl: Iterable[str]) -> ItemsView[Optional[int], list[str os.chmod(f, mode=0o700) # store full paths to each file, so we don't need to drag along # tmpdir as some basedir - sessions[session] = archive_content + sessions[str(session)] = archive_content session += 1 if session == 1: # we had only 1 session (and at least 1), so not really multiple # sessions according to classical 'heudiconv' assumptions, thus # just move them all into None - sessions[None] += sessions.pop(0) + sessions[None] += sessions.pop("0") return sessions.items() diff --git a/heudiconv/utils.py b/heudiconv/utils.py index 7284479..37a1237 100644 --- a/heudiconv/utils.py +++ b/heudiconv/utils.py @@ -78,7 +78,7 @@ class StudySessionInfo(NamedTuple): # StudyInstanceUID into some place within hierarchy locator: Optional[str] - session: Optional[str | int] + session: Optional[str] # should be some ID defined either in cmdline or deduced subject: Optional[str] ```

Some of the code in heudiconv/convert.py extracts fields from BIDS JSON (sidecar?) files,

correct ! called "sidecar" metadata files in BIDS

yet I cannot determine what types these fields are supposed to be. Specifically, the fields in question are:

some times we might be "ahead of the standard" here and get a better luck searching directly in https://github.com/bids-standard/bids-specification . This time it leads to unfortunately never finished (still open) https://github.com/bids-standard/bids-specification/pull/425 . From https://github.com/bids-standard/bids-specification/pull/425/files#diff-37df75f7d2d341bed5815aad788fe7bff09374c335eb0c25fab0b7569e8c0a46R2 - just a str

hm, nothing among PRs. Not standardized within BIDS, but extracted by dcm2niix

❯ git grep EchoNumber
BIDS/README.md:| EchoNumber                         |      | Only multi-echo series                                                                  | D          |
GE/README.md:Current GE software (DV26.0_R03_1831.b) running research multi-echo sequences create invalid DICOM images. The required public [EchoTime (0018,0081)](https://dicom.innolitics.com/ciods/mr-image/mr-image/00180081) attribute lists the shortest echo time for the series, rather than the actual echo time for the given DICOM image. The public tag [EchoNumber (0018,0086)](https://dicom.innolitics.com/ciods/mr-image/mr-image/00180086) reports `1` for all echoes. These limitations in GE's DICOM images disrupt dcm2niix's image conversion. Hopefully future product sequences will generate valid DICOM data. In the meantime, [issue 359](https://github.com/rordenlab/dcm2niix/issues/359) provides a kludge for image conversion.
console/nii_dicom_batch.cpp:                fprintf(fp, "\t\"EchoNumber\": %d,\n", d.echoNum);

should be an int. @pvelasco @tsalo - should we submit a PR for BIDS to formalize it?

to say the truth I am not yet sure about plurality of definitions there, but it should be a float. I don't think we should worry about possible list[float] here

yet another one dcm2niix extracts from DICOMs, but not standardized in BIDS (not sure if this one should, but also might): https://github.com/rordenlab/dcm2niix//blob/HEAD/BIDS/README.md#global-series-information

Make it list[str].

@nipy/team-heudiconv Also, side question: In this code:

https://github.com/nipy/heudiconv/blob/2c6d228516dda41cdc9f91e7bea51b3acdb0f5b7/heudiconv/tests/test_dicoms.py#L30-L33

is dcm.pub supposed to be getattr(dcm, pub)? That would seem to make more sense.

well spotted!!!

yarikoptic commented 1 year ago
  • Nifti1Pair
  • Nifti1Image
  • AnalyzeImage
  • MGHImage

Exactly what type is nb_load(nifti_file) expected to return here?

given our invocation of dcm2niix I believe it is exclusively Nifti1Image

The prov_file argument to heudiconv.dicoms.embed_metadata_from_dicoms() can apparently only be str

why? I think it could still be None if e.g. with_prov was False and not provided to convert. So probably should just be str | None ?

jwodder commented 1 year ago

@yarikoptic I still need a resolution for the issues mentioned in my top comment.

yarikoptic commented 1 year ago
  • heudiconv.dicoms.group_dicoms_into_seqinfos(): If grouping == "custom" and custom_grouping is a callable, the return value is the result of applying custom_grouping() to some arguments; otherwise, the return value is something else. Normally, this could be annotated by using overloads and Literal["custom"], but whenever group_dicoms_into_seqinfos() is called in the heudiconv code, grouping (if present) is always passed as a variable rather than a string literal, and so Literal is unusable here.

ATM it seems that no shipped along heuristic defines custom groupping:

❯ git grep 'def grouping'
docs/heuristics.rst:    def grouping(files, dcmfilter, seqinfo):

and it is somewhat documented at https://heudiconv.readthedocs.io/en/latest/heuristics.html?highlight=grouping#grouping-string-or-grouping-files-dcmfilter-seqinfo and was introduced in https://github.com/nipy/heudiconv/pull/359 . The expectation is that if that is the callable - we get the same dict[SeqInfo, list[str]] (mapping from SeqInfo to the list of DICOM files) so we should cast result into that I guess. And flatten seems to be ignored. FWIW, I am not aware of any heuristic which actually used that feature.

  • The docstring for heudiconv.dicoms.create_seqinfo() states that the first argument is of type nibabel.nicom.dicomwrappers.MosaicWrapper, yet here the supplied argument (obtained from validate_dicom()) is just a nibabel.nicom.dicomwrappers.Wrapper.

I think the docstring overspecified and it should be just a Wrapper.

  • SeqInfo.series_id is clearly meant to be a str, yet there are several places in heudiconv/heuristics/example.py where this field is compared against or assigned to an int variable.

example.py also has an explicit str() of its value... I think it is just generally inconsistent there, as since just an example (ie not really used) -- can be and is likely buggy. I am trying to wrap my head around there...

jwodder commented 1 year ago

@yarikoptic Specifying that the custom_grouping argument to group_dicoms_into_seqinfos() must return dict[SeqInfo, list[str]] doesn't solve the problem, as group_dicoms_into_seqinfos() can return either that or dict[Optional[str], dict[SeqInfo, list[str]]] depending on the value of flatten — yet the function is called with a custom_grouping and flatten=True here, and it's called with a custom_grouping and flatten=False here. The only way for mypy to be sure what type it's getting would be if the values of flatten, grouping, and custom_grouping were all statically known.

yarikoptic commented 1 year ago

but it seems that we can't know them statically and moreover we "violate" it in case of flatten=False whenever Callable custom_groupping is provided. So -- may be for now just comment out the @overloads you defined for the group_dicoms_into_seqinfos and add a comment that such clear/strict typing is not needed since flatten is ignored in case of Callable custom_groupping which is expected to return the other kind?

yarikoptic commented 1 year ago

FWIW -- just checked that it would not be a solution since then code which calls group_dicoms_into_seqinfos would not be able to tell one or another...

yarikoptic commented 1 year ago

I've tuned up that example.py, fixed another minor bug (can go outside of this PR in principle) and also added that casting

-            return custom_grouping(files, dcmfilter, SeqInfo)
+            return cast(Dict[SeqInfo, List[str]],
+                        custom_grouping(files, dcmfilter, SeqInfo))

and mypy seems to be happy for me locally, so may be cast'ing is just enough here ?

yarikoptic commented 1 year ago

hm, on CI (where we have py 3.7) it still fails with following

heudiconv/convert.py:55: error: Cannot find implementation or library stub for
module named "type_extensions"  [import]
            from type_extensions import TypedDict
    ^
heudiconv/convert.py:55: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
heudiconv/convert.py:57: error: Unexpected keyword argument "total" for
"__init_subclass__" of "object"  [call-arg]
        class PopulateIntendedForOpts(TypedDict, total=False):
        ^
.tox/typing/lib/python3.7/site-packages/mypy/typeshed/stdlib/builtins.pyi:113: note: "__init_subclass__" of "object" defined here
Found 2 errors in 1 file (checked 46 source files)
typing: exit 1 (14.48 seconds) /home/runner/work/heudiconv/heudiconv> mypy heudiconv pid=1840
.pkg: _exit> python /opt/hostedtoolcache/Python/3.7.16/x64/lib/python3.7/site-packages/pyproject_api/_backend.py True setuptools.build_meta
  typing: FAIL code 1 (58.[26](https://github.com/nipy/heudiconv/actions/runs/4812417000/jobs/8567639256?pr=656#step:5:27)=setup[43.78]+cmd[14.48] seconds)
  evaluation failed :( (58.40 seconds)

may be we just should go to newer python for type checking?

yarikoptic commented 1 year ago

cool, thanks for fixing! So we are all green -- take out of draft and let's invite others for possible review/feedback/training? @pvelasco @tsalo - interested in reviewing some type annotations ;) ?

yarikoptic commented 1 year ago

@jwodder please check two last small commits where I have (I think) constrained typing a little more - let me know if may be you have reservations against that.

yarikoptic commented 1 year ago

ok -- Let's go. Thank you again @jwodder for all this monumental work!

yarikoptic commented 1 year ago

although indeed internal I want next release to be minor version boost at least due to the scale of this change, so labeling it is as such

github-actions[bot] commented 1 year ago

:rocket: PR was released in v0.13.0 :rocket: