Seriesgrouping - Githubissues

neurorepro commented 1 year ago

This is to address #624

neurorepro commented 1 year ago

You may want to consider #682 first (changes are included in this PR)

codecov[bot] commented 1 year ago

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (e3c53b8) 81.94% compared to head (e9c9289) 81.98%. Report is 43 commits behind head on master.

Files	Patch %	Lines
heudiconv/utils.py	73.68%	5 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #684 +/- ## ========================================== + Coverage 81.94% 81.98% +0.04% ========================================== Files 41 41 Lines 4132 4153 +21 ========================================== + Hits 3386 3405 +19 - Misses 746 748 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

neurorepro commented 1 year ago

@yarikoptic As discussed in #624 I implemented a hash to shorten the name, but the full Series UID could be used too

yarikoptic commented 1 year ago

In general I think the approach makes sense in general. Re why not before: I guess (@satra and @mih might correct if have any recoll of the motivations) is that series id and name (instead of UID) were chosen for human accessibility since they make it easier to understand which particular series in question while annotating .edit.txt and just debugging the operation. So far I do not see immediate problem with adding series UID since AFAIK indeed it should be the same for all files within a single acquisition. Might be worth checking though on more of combined ones like T1PD etc.

TL;DR: we need to approach it with caution and thus need more checking/work. Some reservations/concerns and thinking out loud:

the fact that we ran into it just now suggests that situation is not typical thus questions the change of the default behavior:
- It seems it would apply ONLY in the case of -g all or -g accession i.e. where we force multiple scanning sessions into a single conversion study. That would partially explain why we have not ran into it I guess. Such grouppings were added only few years back IIRC and aren't default mode of operation
- as you note in https://github.com/nipy/heudiconv/issues/624#issuecomment-1406097247 dcm2niix would still do "the right thing" and convert those incorrectly "groupped" files into different files since it would rely on series UUID. So it is just that neither a heuristic nor heudiconv would know how to tease them apart.
it would introduce the core change of behavior
- possibly making it then impossible for heudiconv to operate on prior produced annotations in -c none mode for the subsequent conversion with -c dcm2niix since all groups names would change. Needs a check:
- may be heudiconv needs to become smart and detect in the mode of work while reloading already existing mapping file and then not bother adding UID? might help in transition period but generally not needed/too cumbersome... may be we would just add a check while loading an existing mapping that all the series ids do include that UID hash and error out if they don't with informative message. I think the latter is better.
- but it also might break some heuristics, which might have relied on those series ids (did any which we ship? needs to be checked)
- it could also be made optional - either via command line option (more preferable) or automagicly enabled only in -g all or -g accession (I don't like that). But then it adds differences in behavior based on CLI options, not good

neurorepro commented 1 year ago

@yarikoptic thank you very much for the feedback

Regarding scope of use case: i think it is actually common to have artificially split a single BIDS session into two when the scanning protocol is too long (e.g. AM: anat, resting state, ... PM: diffusion MRI, ...) or the subject needs to exit. It is actually mentioned explicitely in the BIDS definition of session (def number 5):

if a subject has to leave the scanner room and then be re-positioned on the scanner bed, the set of MRI acquisitions will still be considered as a session and match sessions acquired in other subjects. Similarly, in situations where different data types are obtained over several visits (for example fMRI on one day followed by DWI the day after) those can be grouped in one session.

The fact that not many people reported it i think is because users did not know what went wrong and would not think this may be due to heudiconv (who would think of blaming such a magnificent tool ?)

Regarding change of behavior i totally agree that this should be done carefully. I think what seems to be your preferred option would be best:

may be we would just add a check while loading an existing mapping that all the series ids do include that UID hash and error out if they don't with informative message. I think the latter is better.

I will update the PR with that implemented.

If the error is too much for users (for which the use case of needing to keep previous data without uid is justified) then maybe we could add later a temporary option --with-deprecated-series-id ?

neurorepro commented 1 year ago

@yarikoptic I modified the code to check that the conversion table does not use deprecated series IDs. Otherwise it recomputes the conversation table with the non-deprecated series IDs.

I think in the end this may be less frustrating than an error since it identifies the deprecated IDs and recomputes the conversion table accordingly.

I also set the hash length to 8 as you suggested.

neurorepro commented 1 year ago

Hello @yarikoptic does it look like the code is ok now ? Cheers !

yarikoptic commented 11 months ago

@neurorepro ping on above comments.

neurorepro commented 11 months ago

Yes totally for the tests. I actually tested it on two datasets (already having old data, and newly converted) and it went fine. But indeed tests need to be included for continuous testing.

I'll get back to you!

yarikoptic commented 6 months ago

a gentle ping

nipy / heudiconv

Seriesgrouping #684

Codecov Report