Renaming Subject IDs - Githubissues

tjhendrickson commented 6 years ago

Hello,

I was able to successfully convert a data set to BIDS format. I would like to explore some more of the advanced features of heudiconv as my data is longitudinal and organized in an atypical way. All of the data that I am working with is organized by the accession number. This is primarily done so it is guaranteed that one data set wouldn't overwrite another, as may be the case with a longitudinal study. Once I convert the data to BIDS format, however, I would like to sort my data by the subject ids (which is saved within the dicom files) and generate the longitudinal directory structure as shown within the bids specification file: http://bids.neuroimaging.io/bids_spec.pdf.

Let me give an example: I have one participant with two time points labeled with accession numbers 11111, and 55555 and my heudiconv command would be: heudiconv -d /path/to/data/{subject}//.dcm -s 11111, 55555 --minmeta -b -o . -f heuristics.py. Of course, my output would be sub-11111 and sub-55555, however I would prefer it to be organized by the subject id present in the dicom file.

I've looked briefly at the --anon-cmd flag, but from what I can tell from the heudiconv package source code there is no expected template or functions that anon-cmd expects as like the heuristics flag. I'm wondering if there any example scripts amenable to --anon-cmd that I might look at in order to get a since of how this command is expected to work.

Thanks!

mgxd commented 6 years ago

@tjhendrickson are you trying to anonymize your subject ids or simply get them in a session format? If the latter, you can use the --ses flag to specify the different time points.

Slightly altering your example call should do the trick:

heudiconv -d /path/to/data/{subject}/{session}/.dcm -s $subject_id --minmeta -b -o . -f heuristics.py --ses 11111

If the former, this option is a little limited at the moment. It only allows a one word (no spaces or arguments) executable to be applied to your subject_id (ideal if you have some randomizer/encryptor available). We could expand this to allow a call to any commandline (even wrapping your own scripts, example, --anon-cmd python randomizer.py) which would receive the subject_id as an argument and allow for easier customization.

tjhendrickson commented 6 years ago

@mgxd

I am trying to do the former. My ultimate goal with all of this is to transfer multiple time points labeled as different accession numbers in the following format. sub-1111 baseline (corresponding to 11111) data followup (corresponding to 55555) data As this will require some complexity (i.e. determining which scan is baseline and followup) it seems that designing a script calleable by --anon-cmd is the way to go. What is unclear to me however is how I would design a script which receives the specific subject ID provided as the input. This is my primary impetus for asking for an example of source code used.

mgxd commented 6 years ago

I am trying to do the former. My ultimate goal with all of this is to transfer multiple time points labeled as different accession numbers in the following format.

This sounds more like the case of using the session flag - you will essentially call heudiconv twice, once for each accession number

baseline

heudiconv -d /path/to/data/{subject}/11111/*.dcm -s $subject_id --minmeta -b -o . -f heuristics.py --ses baseline

followup

heudiconv -d /path/to/data/{subject}/55555/.dcm -s $subject_id --minmeta -b -o . -f heuristics.py --ses followup

This will save your data in the following format (based on your heuristic file - should look something similar to this one):

├── subject_id
│   ├── ses-baseline
│           └── data
│   ├── ses-followup
│           └── data
├── participants.tsv
└── README.md

As this will require some complexity (i.e. determining which scan is baseline and followup) it seems that designing a script calleable by --anon-cmd is the way to go. What is unclear to me however is how I would design a script which receives the specific subject ID provided as the input. This is my primary impetus for asking for an example of source code used.

The anon-cmd argument was added before I joined the project, so I'm not too sure of its original intent - but to use it in conjunction with a custom script would first require a modification of heudiconv's source code.

tjhendrickson commented 6 years ago

Hmm, unfortunately the paths /path/to/data/{subject}/55555/.dcm nor /path/to/data/{subject}/11111/.dcm exist, rather the format for both 55555 and 11111 are as follows /path/to/data/55555//.dcm /path/to/data/11111//.dcm.

-Tim

Timothy Hendrickson Department of Psychiatry University of Minnesota Bioinformatics and Computational Biology M.S. Candidate Office: 612-624-6441 Mobile: 507-259-3434 (texts okay)

On Fri, Jan 26, 2018 at 4:06 PM, Mathias Goncalves <notifications@github.com

wrote:

I am trying to do the former. My ultimate goal with all of this is to transfer multiple time points labeled as different accession numbers in the following format.

This sounds more like the case of using the session flag - you will essentially call heudiconv twice, once for each accession number

baseline

heudiconv -d /path/to/data/{subject}/11111/*.dcm -s $subject_id --minmeta -b -o . -f heuristics.py --ses baseline

followup

heudiconv -d /path/to/data/{subject}/55555/.dcm -s $subject_id --minmeta -b -o . -f heuristics.py --ses followup

This will save your data in the following format (based on your heuristic file - should look something similar to this one https://github.com/mgxd/heudiconv/blob/2ff3cd5d038d6b6582a92dc3e3382e641720aa97/heuristics/bids_with_ses.py ):

├── subject_id │ ├── ses-baseline │ └── data │ ├── ses-followup │ └── data ├── participants.tsv └── README.md

As this will require some complexity (i.e. determining which scan is baseline and followup) it seems that designing a script calleable by --anon-cmd is the way to go. What is unclear to me however is how I would design a script which receives the specific subject ID provided as the input. This is my primary impetus for asking for an example of source code used.

The anon-cmd argument was added before I joined the project, so I'm not too sure of its original intent - but to use it in conjunction with a custom script would first require a modification of heudiconv's source code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nipy/heudiconv/issues/139#issuecomment-360917635, or mute the thread https://github.com/notifications/unsubscribe-auth/AHFe-NdHstFLGppTpn0eocl8CtvM2Mz0ks5tOkyAgaJpZM4RutfW .

mgxd commented 6 years ago

in that case, you may be better off using the --files arg instead of -d (@yarikoptic could suggest the best way to use this)

perhaps

// first command
heudiconv --files /path/to/data/11111 --minmeta -b -o . -f heuristics.py --ses baseline

// second command
heudiconv --files /path/to/data/55555 --minmeta -b -o . -f heuristics.py --ses followup

Let me know if that helps!

yarikoptic commented 6 years ago

FWIW, with https://github.com/nipy/heudiconv/blob/master/heuristics/dbic_bids.py we are fully achieving automated subject/session layout'ing using information solely from DICOMs so at the end it looks like this: http://datasets.datalad.org/?dir=/dbic/QA (look at sub-qa for multiple sessions) . Session is specified within one of the sequences in the accession, typically within a scout, e.g. here is one of the accessions:

bids@rolando:/inbox/DICOM/2017/11/06$ ls qa/
001-anat-scout_ses-{date}          004-anat-scout_ses-{date}_MPR_tra        008-dwi_acq-DTI-30-p2
002-anat-scout_ses-{date}_MPR_sag  005-func-bold_task-rest_acq-p2           009-dwi_acq-DTI-30-p2-s4
003-anat-scout_ses-{date}_MPR_cor  007-func-bold_task-rest_acq-p2-s4-3.5mm  010-anat-T1w_acq-MPRAGE

which also shows the magic one {date} which gets replaced with the date of accession



And yes, our typical invocation is in `--files` mode, and point to a specific accession directory (or multiple directories, accessions get separated based on UID and processed separately). Well, to be precise, our version in deployment (the released one) is still old and doesn't have `--files` so here is a full sample invocation

```
heudiconv  -f ~/heudiconv/heuristics/dbic_bids.py \
  -c dcm2niix -o /inbox/BIDS --bids --datalad /path/to/accession [...morepaths...]
```
hope this helps

yarikoptic commented 6 years ago

Hi @tjhendrickson . did you get it working as you desired or there is something we could do to improve/fix?

thomshaw92 commented 6 years ago

Hi all, I'm having a similar problem. My subject IDs have a 'timestamp' (representing a time point) at the end of the import subject folder and embedded in the DICOM header, e.g., 1001AB01/MPRAGE/dcms.IMA 1001AB06/MPRAGE/dcms.IMA etc...

I'd like to arrange in BIDS format with the timestamp removed but i don't like the idea of renaming my subject folders. Any ideas? I've tried the --files option and using ?? as wildcards unsuccessfully. Thanks!

thomshaw92 commented 6 years ago

Fixed my own issue for the moment. For anyone interested, I made a csv of my folder names and looped through them.

for x in `cat subjnames.csv` ; do heudiconv -d ./{subject}01/*/*IMA -s ${x:0:6} --ses 01 -f ./heudiconv_file_bids.py -c dcm2niix -b --minmeta -o ./out/ ; done

nipy / heudiconv

Renaming Subject IDs #139