nipy / heudiconv

Flexible DICOM conversion into structured directory layouts
https://heudiconv.readthedocs.io
Other
234 stars 125 forks source link

Seriesgrouping #684

Open neurorepro opened 1 year ago

neurorepro commented 1 year ago

This is to address #624

neurorepro commented 1 year ago

You may want to consider #682 first (changes are included in this PR)

codecov[bot] commented 1 year ago

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (e3c53b8) 81.94% compared to head (e9c9289) 81.98%. Report is 43 commits behind head on master.

Files Patch % Lines
heudiconv/utils.py 73.68% 5 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #684 +/- ## ========================================== + Coverage 81.94% 81.98% +0.04% ========================================== Files 41 41 Lines 4132 4153 +21 ========================================== + Hits 3386 3405 +19 - Misses 746 748 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

neurorepro commented 1 year ago

@yarikoptic As discussed in #624 I implemented a hash to shorten the name, but the full Series UID could be used too

yarikoptic commented 1 year ago

In general I think the approach makes sense in general. Re why not before: I guess (@satra and @mih might correct if have any recoll of the motivations) is that series id and name (instead of UID) were chosen for human accessibility since they make it easier to understand which particular series in question while annotating .edit.txt and just debugging the operation. So far I do not see immediate problem with adding series UID since AFAIK indeed it should be the same for all files within a single acquisition. Might be worth checking though on more of combined ones like T1PD etc.

TL;DR: we need to approach it with caution and thus need more checking/work. Some reservations/concerns and thinking out loud:

neurorepro commented 1 year ago

@yarikoptic thank you very much for the feedback

Regarding scope of use case: i think it is actually common to have artificially split a single BIDS session into two when the scanning protocol is too long (e.g. AM: anat, resting state, ... PM: diffusion MRI, ...) or the subject needs to exit. It is actually mentioned explicitely in the BIDS definition of session (def number 5):

if a subject has to leave the scanner room and then be re-positioned on the scanner bed, the set of MRI acquisitions will still be considered as a session and match sessions acquired in other subjects. Similarly, in situations where different data types are obtained over several visits (for example fMRI on one day followed by DWI the day after) those can be grouped in one session.

The fact that not many people reported it i think is because users did not know what went wrong and would not think this may be due to heudiconv (who would think of blaming such a magnificent tool ?)

Regarding change of behavior i totally agree that this should be done carefully. I think what seems to be your preferred option would be best:

may be we would just add a check while loading an existing mapping that all the series ids do include that UID hash and error out if they don't with informative message. I think the latter is better.

I will update the PR with that implemented.

If the error is too much for users (for which the use case of needing to keep previous data without uid is justified) then maybe we could add later a temporary option --with-deprecated-series-id ?

neurorepro commented 1 year ago

@yarikoptic I modified the code to check that the conversion table does not use deprecated series IDs. Otherwise it recomputes the conversation table with the non-deprecated series IDs.

I think in the end this may be less frustrating than an error since it identifies the deprecated IDs and recomputes the conversion table accordingly.

I also set the hash length to 8 as you suggested.

neurorepro commented 1 year ago

Hello @yarikoptic does it look like the code is ok now ? Cheers !

yarikoptic commented 11 months ago

@neurorepro ping on above comments.

neurorepro commented 11 months ago

Yes totally for the tests. I actually tested it on two datasets (already having old data, and newly converted) and it went fine. But indeed tests need to be included for continuous testing.

I'll get back to you!

yarikoptic commented 6 months ago

a gentle ping