psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

FR: dicom-import flattens folder structure to minimum when getting dcms from tarball #147

Open pvavra opened 4 years ago

pvavra commented 4 years ago

We obtain tarballs with dicoms which have several subfolders:

folder structure inside tarball, w/ dicoms only in the lowest level (with subject-identifying info removed for this post):

[subject_id]
        └── study_1_[date_of_acq]
            ├── series_10_IR-EPI_2.2iso
            ├── series_11_DTIcmrr_1.8iso_3shell_b2400_TE66_SMS2_AP
            ├── series_12_DTIcmrr_1.8iso_3shell_b2400_TE66_SMS2_PA
            ├── series_13_fMRI_withouttask1_SMS2_2.2iso_66sl_TR2_SBRef
            ├── series_14_fMRI_withouttask1_SMS2_2.2iso_66sl_TR2
            ├── series_15_fMRI_withouttask2_SMS2_2.2iso_66sl_TR2_SBRef
            ├── series_16_fMRI_withouttask2_SMS2_2.2iso_66sl_TR2
            ├── series_17_fMRI_withouttask3_SMS2_2.2iso_66sl_TR2_SBRef
            ├── series_18_fMRI_withouttask3_SMS2_2.2iso_66sl_TR2
            ├── series_19_fMRI_withouttask4_SMS2_2.2iso_66sl_TR2_SBRef
            ├── series_1_AAHead_Scout
            ├── series_20_fMRI_withouttask4_SMS2_2.2iso_66sl_TR2
            ├── series_2_AAHead_Scout_MPR_sag
            ├── series_3_AAHead_Scout_MPR_cor
            ├── series_4_AAHead_Scout_MPR_tra
            ├── series_5_t1_mpr_sag_1iso_p2
            ├── series_6_fMRI_resting_SMS2_2.2iso_66sl_TR2_SBRef
            ├── series_7_fMRI_resting_SMS2_2.2iso_66sl_TR2
            ├── series_8_gre_field_mapping_2.2iso_66sl_acpc
            ├── series_99_PhoenixZIPReport
            └── series_9_gre_field_mapping_2.2iso_66sl_acpc

The import-dicom step works fine, but unnecessarily maintains the folder structure.

More simple would be to flatten everything and have only dicoms/series_* as a folder structure since no additional files are present at those dropped folders.

Currently, the structure is:

[acq-label]
└── dicoms
    └── [subject id]
        └── study_1_[date_of_acq]
            ├── series_10_IR-EPI_2.2iso
            ├── ...
bpoldrack commented 4 years ago

I see how this can be unwanted, but I don't agree. There's no technical reason to enforce a structure independently on what's the structure in the tarball. Hirni itself operates solely on the extracted metadata. It doesn't care about the directory structure within that subdataset. Whether or not that is wanted is pretty much up to the user. So, the thing to think about from my POV would be optional, defaulting to current behavior which I see as "take what the user gave you - they know what they want".

pvavra commented 4 years ago

I agree that the default should be to keep the structure as is inside the tarball, but maybe adding a simple flag --flatten could produce the structure I proposed.

I agree that there is not technical reason for this simpler structure, but there is a practical one, I think. E.g. when someone is navigating the folder using a gui, having this structure is ... let's say ... annoying. This is purely for the convenience of the user, and hence a feature request :-)

Note that "manual" navigation of these files/folders is useful when checking whether a custom rule-set is actually producing proper studyspecs files.