khanlab / tar2bids

GNU General Public License v3.0
7 stars 9 forks source link

tar2bids gets subject name incorrect #13

Closed AlanKuurstra closed 3 years ago

AlanKuurstra commented 4 years ago

tar2bids determines that the date of the scan is my subject name.

the error occurs on: Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar

where study description: Menon^Mouse_APPNL-G-F date: 20200210 subject: NL_31_1F9 study_id: 20200210_01 hash of study uid: A3AD08CB

and the bids output is: bids/sub-20200210/anat/sub-20200210_part-mag_echo_run-01_GRE.nii.gz

It might be worth getting info from the dicom header instead of an assumed folder or filename convention.

akhanf commented 4 years ago

sorry for the delay -- wonder if this is best addressed by making a fix in cfmm2tar for the bruker data? What do you think @yinglilu and @AlanKuurstra ?

AlanKuurstra commented 4 years ago

Yeah, it makes sense to bidsify the tags there. I did something similar in python. Not sure if it will help, but here's the code:

import os, pydicom, subprocess, shutil
from glob import glob

def bidsify_string(string_to_bidsify):
    return string_to_bidsify.replace('_', '')

dicom_root = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/dicoms'
heuristic = '/softdev/akuurstr/python/modules/mouse_resting_state/cfmm_bruker_mouse_heudiconv_heuristic.py'
dcm_dir_template = os.path.join(dicom_root, '*/*/*/{subject}/{session}.*/*/*.dcm')
bids_output = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/bids2'

# remove underscores from patient names (used for BIDS subjects)
subject_folders = os.path.join(dcm_dir_template.split('{subject}')[0].replace('{session}','*'),'*')
for subject_folder in glob(subject_folders):
    shutil.move(subject_folder,
                os.path.join(os.path.dirname(subject_folder), bidsify_string(os.path.basename(subject_folder))))
# remove underscores from StudyIDs (used for BIDS session)
session_folders = os.path.join(dcm_dir_template.split('{session}')[0].replace('{subject}','*'),'*')
for session_folder in glob(session_folders):
    shutil.move(session_folder,
                os.path.join(os.path.dirname(session_folder), bidsify_string(os.path.basename(session_folder))))

completed_patient_sessions = []
for root, dirs, files in os.walk(dicom_root):
    for file in files:
        if file.endswith(".dcm"):
            dcm_file = pydicom.read_file(os.path.join(root, file), stop_before_pixels=True)
            if 'rsFMRI' in dcm_file.ProtocolName:
                bids_subject = bidsify_string(str(dcm_file.PatientName))
                bids_session = bidsify_string(str(dcm_file.StudyID))
                if (bids_subject,bids_session) in completed_patient_sessions:
                    continue
                subprocess.call(
                    ["heudiconv", "-b", "-d", dcm_dir_template, "-o", bids_output, "-f", heuristic, "-s", bids_subject,
                     "-ss", bids_session, "--overwrite"])
                completed_patient_sessions.append((bids_subject,bids_session))
yinglilu commented 4 years ago

Sorry. On group meeting. Will check it out soon.

yl

On Wed, Mar 11, 2020 at 12:41 PM AlanKuurstra notifications@github.com wrote:

Yeah, it makes sense to bidsify the tags there. I did something similar in python. Not sure if it will help, but here's the code:

import os, pydicom, subprocess, shutil from glob import glob

def bidsify_string(string_to_bidsify): return string_tobidsify.replace('', '')

dicom_root = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/dicoms' heuristic = '/softdev/akuurstr/python/modules/mouse_resting_state/cfmm_bruker_mouse_heudiconv_heuristic.py' dcm_dir_template = os.path.join(dicom_root, '///{subject}/{session}.//.dcm') bids_output = '/home/akuurstr/Desktop/Esmin_mouse_registration/mouse_scans/bids'

remove underscores from patient names (used for BIDS subjects)

todo: extract all possible subject folders from dcm_dir_template

for subject_folder in glob(os.path.join(dicom_root, "///")): shutil.move(subject_folder, os.path.join(os.path.dirname(subject_folder), bidsify_string(os.path.basename(subject_folder))))

remove underscores from StudyIDs (used for BIDS session)

todo: extract all possible session folders from dcm_dir_template

for session_folder in glob(os.path.join(dicom_root, "////*")): shutil.move(session_folder, os.path.join(os.path.dirname(session_folder), bidsify_string(os.path.basename(session_folder))))

for root, dirs, files in os.walk(dicom_root): for file in files: if file.endswith(".dcm"): dcm_file = pydicom.read_file(os.path.join(root, file), stop_before_pixels=True) if 'rsFMRI' in dcm_file.ProtocolName: bids_subject = bidsify_string(str(dcm_file.PatientName)) bids_session = bidsify_string(str(dcm_file.StudyID)) subprocess.call( ["heudiconv", "-b", "-d", dcm_dir_template, "-o", bids_output, "-f", heuristic, "-s", bids_subject, "-ss", bids_session, "--overwrite"])

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/khanlab/tar2bids/issues/13#issuecomment-597740809, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBPPQYFAGVGDQOHDLEYAMLRG65M7ANCNFSM4KX25NLQ .

-- From iphone

yinglilu commented 4 years ago

Hi,

Double checked cfmm2tar.py and tar2bar, Seems both works!

Tested with

./tar2bids -P "NL_31_1F9"  "Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar"

Get output:

PI=Menon Study=Mouse_APPNL-G-F Date=20200210 PatientName=NL_31_1F9_20200210

parsed patient name correctly(check BTW).

Is it possible that the problem was caused when running 'tar2bar' or by the heuristics file?

Cheers,

YingLi

BTW:

line 180: if change patient=${patient_etc%_[0-9]*.*} to patient=${patient_etc%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]*.*}

get

PI=Menon Study=Mouse_APPNL-G-F Date=20200210 PatientName=NL_31_1F

AlanKuurstra commented 4 years ago

When I was getting errors, I did not directly pass the subject using -P. It seems to do better using that flag.

But it still isn't correct, since: subject: NL_31_1F9 study_id: 20200210_01 hash of study uid: A3AD08CB

but in your example the subject is returned as NL_31_1F9_20200210

Note that some variant of 20200210_01.A3AD08CB should be interpreted as the BIDS session.

yinglilu commented 4 years ago

Hi Alan,

Previous post BTW section should fix it:-)

AlanKuurstra commented 4 years ago

Since in this situation tar2bids only works with the -P flag, I would suggest that puts the burden of parsing on the user instead of cfmm2tar.

yinglilu commented 4 years ago

Yup. -P is simple, flexible and powerful. Sometimes, user is the only one who knows which part is the subject name.

Ali,

Is it okay to modify tar2bids line 180 from

patient=${patient_etc%_[0-9]*.*}

to

patient=${patient_etc%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]*.*}

?

AlanKuurstra commented 4 years ago

But I guess you could get the subject name from the dicom headers like what the above python code does.

I'm not sure how your whole automation system works, but Will the -P flag work with autobids?

In my opinion, the better solution would be to make a change in cfmm2tar like Ali suggested. In cfmm2tar you could open one of the dicom headers from the tar file. Then use the dicom tags (StudyDescription, AcquisitionDate, PatientName, StudyID, and the hash of the StudyInstanceUID ) to determine the structure of the tarfile name. You could then bidsify the tar filename (take out underscores etc) so that cfmm2tar will work as intended.

yinglilu commented 4 years ago

Hi Alan,

The tarfile name structure was done exactly same with your solution, please check

line 74: 
https://github.com/yinglilu/cfmm2tar/blob/master/sort_rules.py
AlanKuurstra commented 4 years ago

I used the latest commit from https://github.com/khanlab/cfmm2tar. An example tar output is:

Menon_Mouse_APPNL-G-F_20200210_NL_31_1F9_20200210_01.A3AD08CB.tar

where study description: Menon^Mouse_APPNL-G-F date: 20200210 subject: NL_31_1F9 study_id: 20200210_01 hash of study uid: A3AD08CB

which shows that the tar filename has not correctly been made bids friendly and does not work with tar2bids. If we've decided to keep tar2bids as is, perhaps I should move this issue to that repo.

akhanf commented 4 years ago

Hi guys,

Sorry have been away from this and just trying to follow along now -- if I understand correctly, cfmm2tar is using the same dicom tags to build the tar file (whether Bruker or not), but it is just that the Bruker tags include some extra substrings that make the tar2bids parsing not possible, unless the -P flag is used?

As for changing the structure of the tar file to bids-ify it, that would be a change that would break compatibility with previously generated tar files, so less inclined to make a change that breaks all the 3T and 7T data out there already.. But is there a change we can make to how the Bruker tar files are created (in cfmm2tar) so that at least tar2bids can work in a similar fashion, without requiring the -P?

yinglilu commented 4 years ago

Hi Ali,

Yes. You are right.

I am looking at the code(cfmm2tar) and trying to find a simple solution.

yl

akhanf commented 4 years ago

Yingli, Just chatted with Alan -- have some ideas for a solution perhaps we can discuss when we meet tomorrow.

yinglilu commented 4 years ago

Awesome! see you guys tomorrow.

akhanf commented 4 years ago

@AlanKuurstra, the parsing for subject should be fixed in docker://khanlab/tar2bids:latest now, (thanks @yinglilu) let me know when you get a chance to try it out

akhanf commented 3 years ago

Closing now as I think this is fixed, but @AlanKuurstra feel free to re-open if not..