nipy / heudiconv

Flexible DICOM conversion into structured directory layouts
https://heudiconv.readthedocs.io
Other
236 stars 125 forks source link

Non BIDS-compliant suffixes when using --bids flag #520

Open Terf opened 3 years ago

Terf commented 3 years ago

Summary

After converting a large clinical dataset with heudiconv I'm seeing many files with odd suffixes.

Some files have a heudiconvXXX suffix, which appears to be logic added in v0.9.0, as you can see the difference between 0.8.0 and 0.9.0:

heudiconv0.8.0
└── sub-xxx
    └── ses-xxx
        ├── anat
        │ ├── sub-xxx_ses-xxx_acq-sagspacet2flairbrain_run-01_FLAIR.json
        │ ├── sub-xxx_ses-xxx_acq-sagspacet2flairbrain_run-01_FLAIR.nii.gz
        │ ├── sub-xxx_ses-xxx_acq-sagt1mpragebrain_run-01_T1w.json
        │ └── sub-xxx_ses-xxx_acq-sagt1mpragebrain_run-01_T1w.nii.gz
        ├── dwi
        ├── sub-xxx_ses-xxx_scans.json
        └── sub-xxx_ses-xxx_scans.tsv
heudiconv0.9.0
└── sub-xxx
    └── ses-xxx
        ├── anat
        │ ├── sub-xxx_ses-xxx_acq-sagspacet2flairbrain_run-01_FLAIR.json
        │ ├── sub-xxx_ses-xxx_acq-sagspacet2flairbrain_run-01_FLAIR.nii.gz
        │ ├── sub-xxx_ses-xxx_acq-sagt1mpragebrain_run-01_T1w.json
        │ └── sub-xxx_ses-xxx_acq-sagt1mpragebrain_run-01_T1w.nii.gz
        ├── dwi
        │ ├── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057a.bval
        │ ├── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057a.bvec
        │ ├── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057a.json
        │ ├── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057a.nii.gz
        │ ├── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057.bval
        │ ├── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057.bvec
        │ ├── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057.json
        │ └── sub-xxx_ses-xxx_acq-axialdtibrain_run-01_dwi_heudiconv057.nii.gz
        ├── sub-xxx_ses-xxx_scans.json
        └── sub-xxx_ses-xxx_scans.tsv

I ended up renaming these files to remove the suffix (incrementing the run number if that caused a collision with another file). But I’m wondering if there’s a reason heudiconv shouldn’t do this itself, as when using the --bids flag all output should be BIDS-compliant, but @yarikoptic said this was “a feature which would ease troubleshooting”. What do the heudiconvXXX suffixes indicate, and how should I rename them to be BIDS-compliant?

I also have images with numerical suffixes e.g. FLAIR1 and FLAIR2 as well as this _EQ suffix, but can’t track down where in the heudiconv source code those are added, so I’m wondering what they indicate and how I should deal with them as well.

Platform details:

Choose one:

yarikoptic commented 3 years ago

Sorry for the delay! I was about to release 0.10.0 and spotted that you referenced https://github.com/nipy/heudiconv/pull/485 as a possible reason. (edit: well done on diagnosis and cross-referencing, thank you!)

@yarikoptic said this was “a feature which would ease troubleshooting”. What do the heudiconvXXX suffixes indicate, and how should I rename them to be BIDS-compliant?

they aren't because we do not know how those files differ -- what does the heuristic you use (which one?) do/tell them to be named? what is the difference between them -- just a run? then heuristic should instruct to add _run or if you used reproin, you could say _run+ which would automagically do that for you (but ideally you better have some explicit indicator of run somewhere within protocol name so you could tell for sure e.g. canceled run from a complete one etc).

Terf commented 3 years ago

@yarikoptic thanks for the response! I'm a bit confused though, are you saying the heudiconvXXX suffixes should only appear when the heuristic doesn't distinguish between two images and doesn't include a run parameter in the heuristic, or is the latter point irrelevant? My heuristic defines the template strings fed to create_key like 'sub-{subject}/{session}/dwi/sub-{subject}_{session}_acq-%s_run-{item:02d}_dwi' % protocol_name.lower(), so every image with a different protocol_name is guaranteed to be distinguished, and the run-{item:02d} is specified so if two images do have the same protocol_name I would think the run would get incremented -- but that does not appear to be the case.

yarikoptic commented 3 years ago

Did you have two sequences with the same protocol name?

Edit: don't remember the meaning of item variable in that format expression. You can check out how we deal with duplicates in reproin heuristic. May be if you shared your heuristic I could get a better idea

yarikoptic commented 3 years ago

If it was a single sequence, such had eg original + filtered or otherwise processed one, then your heuristic needs to pay attention to such processed/derivative files

Terf commented 3 years ago

@yarikoptic I uploaded the full heuristic here: https://gist.github.com/Terf/89975b5aaaac65217c6204b886fae975

Thanks for taking a look! The two sequences do have the same protocol name, so I'm looking for a way to get the output filenames to be identical except for an incremented run number.

yarikoptic commented 3 years ago

will have a look.

The two sequences do have the same protocol name, so I'm looking for a way to get the output filenames to be identical except for an incremented run number.

that is what you are to do in your heuristic. once again -- can have a look at reproin heuristic on that.

edit: you could also use reproin heuristic, just remap your protocol names into reproin (with _run+ for where you want to have it autoincremented). See e.g. https://github.com/ReproNim/reproin/issues/18#issuecomment-834598084 for such "heuristic" file which remaps into reproin.