nipy / heudiconv

Flexible DICOM conversion into structured directory layouts
https://heudiconv.readthedocs.io
Other
232 stars 125 forks source link

having an option to rerun heuristic instead of reload #749

Open dcdace opened 5 months ago

dcdace commented 5 months ago

Summary

I am reusing the same heuristic for multiple sets of data for the same subject. The problem is that instead of re-running the heuristic for the new set, heudiconv reloads the previous result. Consequently, the new sets of data are not converted. Why is the reload needed at all? Why not allow a re-run? Could there be at least an option to disable the reload?

Platform details:

dcdace commented 5 months ago

In addition, I used the --overwrite option which I expected to 'overwrite' everything previously used not to reuse it. It's rather misleading.

yarikoptic commented 5 months ago

are you specifying a new subject or session id? I am not sure exactly what "reload" would mean here. Please provide a little more complete detail on your invocations of heudiconv here.

dcdace commented 5 months ago

For certain reasons, we need to process the data by modality. We first convert the MPRAGE scans for all subjects, then we convert EPI scans for all subjects etc. The subject/session IDs are created with the first iteration (when converting MPRAGE). When we proceed to convert EPI, the .heudiconv folder is already in the dataset and the sub_ses-sessID.edit.txt with the previously filled info dictionary is already there. Instead of going through our heuristics again and finding the key for EPI scans, in the second iteration, Heudiconv skips this step completely, instead, the existing sub_ses-sessID.edit.txt is loaded and MPRAGE is processed again.

See an example output of what happens when we run heudiconv the second time, for the EPI modality:

Processing subject CC210088
DICOM path: /mridata/../Series_004_CBU_EPI_restingstate
INFO: Running heudiconv version 1.0.1 latest 1.1.0
INFO: Analyzing 261 dicoms
INFO: Generated sequence info for 1 studies with 1 entries total
WARNING: Heuristic is missing an `infotoids` method, assigning empty method and using provided subject id CC210088. Provide `session` and `locator` fields for best results.
INFO: Study session for StudySessionInfo(locator=None, session='P2', subject='CC210088')
INFO: Need to process 1 study sessions
INFO: PROCESSING STARTS: {'subject': 'CC210088', 'outdir': '/imaging/../data/', 'session': 'P2'}
INFO: Processing 1 pre-sorted seqinfo entries
INFO: Reloading existing filegroup.json because /imaging/../data/.heudiconv/CC210088/ses-P2/info/CC210088_ses-P2.edit.txt exists
INFO: Doing conversion using dcm2niix
INFO: Converting /imaging/../data/sub-CC210088/ses-P2/anat/sub-CC210088_ses-P2_T1w (192 DICOMs) -> /imaging/../data/sub-CC210088/ses-P2/anat . Converter: dcm2niix . Output types: ('nii.gz',)

It starts correctly, analysing 261 dicoms of 'Series_004_CBU_EPI_restingstate'. But it is not using the heuristic at all, instead, it reloads existing CC210088_ses-P2.edit.txt which has this content:

{('sub-{subject}/{session}/anat/sub-{subject}_{session}_T1w', ('nii.gz',), None): ['2-CBU_MPRAGE_32chn'],
 ('sub-{subject}/{session}/func/sub-{subject}_{session}_task-Rest_bold', ('nii.gz',), None): []}

and consequently it converts the 2-CBU_MPRAGE_32chn again, not Series_004_CBU_EPI_restingstate.

dcdace commented 4 months ago

Any news regarding this? I still think it would be very useful to have the option to rerun the heuristic instead of reloading.