Open plbenveniste opened 1 month ago
What name should be used for the git-annex repo ? ms-karolinska ?
given that karolinska has/will contribute to multiple datasets, coming from different studies, I think we need to specify them. Eg, this one could be called: ms-karolinska-2020
Thanks @jcohenadad !
I am currently working on the bidsification of the data and facing a few issues. I am following the dcm2bids tutorial and ran the following commands:
conda activate dcm2bids
mkdir bids_karo
dcm2bids_scaffold -o bids_karo
The config file I created is the following (feedback is welcome on the suffixes chosen). It is stored in bids_karo/code
{
"descriptions": [
{
"datatype": "anat",
"suffix": "acq-MPRsag_T1w",
"criteria": {
"SeriesDescription": "t1_mpr_ns_sag_1mm_iso"
}
},
{
"datatype": "anat",
"suffix": "acq-sagDF_T2w",
"criteria": {
"SeriesDescription": "t2_space_dark-fluid_sag_REK_tra_3mm"
}
},
{
"datatype": "anat",
"suffix": "acq-sagDF_T2w",
"criteria": {
"SeriesDescription": "t2_space_dark-fluid_sag_REK_3mm_tra"
}
},
{
"datatype": "anat",
"suffix": "acq-tseSag_T2w",
"criteria": {
"SeriesDescription": "t2_tse_sag_MS"
}
},
{
"datatype": "anat",
"suffix": "acq-sagP2_T2w",
"criteria": {
"SeriesDescription": "t2_space_sag_p2_iso_REK_tra_3mm"
}
},
{
"datatype": "anat",
"suffix": "acq-sagP2_T2w",
"criteria": {
"SeriesDescription": "t2_space_sag_p2_iso_REK_3mm_tra"
}
},
{
"datatype": "anat",
"suffix": "acq-me2d_T2w",
"criteria": {
"SeriesDescription": "t2_me2d_tra_p2_3mm"
}
},
{
"datatype": "anat",
"suffix": "acq-tse_T2w",
"criteria": {
"SeriesDescription": "t2_tse_tra"
}
},
{
"datatype": "anat",
"suffix": "acq-MPRsag_T1w",
"criteria": {
"SeriesDescription": "t1_mpr_ns_sag_1mm_iso_REK_1mm_tra"
}
},
{
"datatype": "anat",
"suffix": "acq-MPRsag_T1w",
"criteria": {
"SeriesDescription": "t1_mpr_ns_sag_1mm_iso_MPR_3mm_tra"
}
},
{
"datatype": "anat",
"suffix": "acq-MPRsagDF_T2w",
"criteria": {
"SeriesDescription": "t2_space_dark-fluid_sag_MPR_3mm_tra"
}
},
{
"datatype": "anat",
"suffix": "acq-MPRsagP2_T2w",
"criteria": {
"SeriesDescription": "t2_space_sag_p2_iso_MPR_3mm_tra"
}
}
]
}
I created the following bash script to convert the dcm images to BIDS:
#!/bin/bash
# Check if the correct number of arguments is provided
if [ "$#" -ne 3 ]; then
echo "Usage: $0 path/to/config.json path/to/output_dir path/to/dicom"
exit 1
fi
# Get the config file, output directory, and DICOM directory from the command line arguments
config_file="$1"
output_dir="$2"
dicom_path="$3"
# Iterate over each folder in the DICOM directory
for folder in "$dicom_path"/SW1-*; do
# Check if it is a directory
if [ -d "$folder" ]; then
# Extract participant and session info from the folder name
# The folder bame is */SW1-1773_M0: the participant should be 1773 and the session M0
subfolder="${folder##*/}"
# Participant is the number after SW1- and before _
participant="${subfolder#SW1-}"
participant="${participant%%_*}"
# Session is the letter after the _
session="${subfolder##*_}"
echo "Converting participant $participant session $session"
# Define the DICOM directory
dicom_dir="$folder"
# Run dcm2bids
dcm2bids -d "$dicom_dir" -p "$participant" -s "$session" -c "$config_file" -o "$output_dir" --bids_validate
fi
done
echo "All conversions are done."
The script was ran using the following command:
bids_karo/code/convert_dcm2bids.sh bids_karo/code/dcm2bids_config.json bids_karo/ 20200612_longitudinal/Karolinska_data.1/
However, some files don't have the field SeriesDescription
and that was raised in the output :
INFO | SIDECAR PAIRING
INFO | No Pairing <- 001_SW1-1875_M12_0_i00001
INFO | No Pairing <- 001_SW1-1875_M12_0_i00004
INFO | No Pairing <- 002_SW1-1875_M12_0
INFO | No Pairing <- 002_SW1-1875_M12_0a
INFO | No Pairing <- 003_SW1-1875_M12_0
INFO | No Pairing <- 003_SW1-1875_M12_0a
INFO | No Pairing <- 004_SW1-1875_M12_0
WARNING | NO PAIRING WAS FOUND. BIDS FOLDER "BIDS_KARO/SUB-1875/SES-M12" WON'T BE CREATED. CHECK YOUR CONFIG FILE.
You can find these files and logs in the following folder : duke/temp/plben/create_karo_gitannex/bids_karo/tmp_dcm2bids
What should I do ? How should I modify my config file to work with these files (I just showed an example but there are more that didn't work).
@jcohenadad @valosekj @NathanMolinier Any ideas ?
After some investigation, I found that using "SequenceName" would work as well to create the file suffix. The suffixes created are the following:
{
"descriptions": [
{
"datatype": "anat",
"suffix": "acq-sagMprage_T1w",
"criteria": {
"SequenceName": "*tfl3d1_16ns"
}
},
{
"datatype": "anat",
"suffix": "acq-sagTse_T2w",
"criteria": {
"SequenceName": "*tseR2d1rr19"
}
},
{
"datatype": "anat",
"suffix": "acq-me2d_T2w",
"criteria": {
"SequenceName": "*me2d1r4"
}
},
{
"datatype": "anat",
"suffix": "acq-Tse_T2w",
"criteria": {
"SequenceName": "*tseR2d1rs17"
}
},
{
"datatype": "anat",
"suffix": "acq-sagMprageDf_T2w",
"criteria": {
"SequenceName": "*spcir_278ns"
}
},
{
"datatype": "anat",
"suffix": "acq-sagMprageP2_T2w",
"criteria": {
"SequenceName": "*spcR_282ns"
}
},
{
"datatype": "anat",
"suffix": "localiser",
"criteria": {
"SequenceName": "*fl2d1"
}
},
{
"datatype": "anat",
"suffix": "acq-sag_T1w",
"criteria": {
"SequenceName": "*spcir_257ns"
}
},
{
"datatype": "anat",
"suffix": "acq-epB0",
"criteria": {
"SequenceName": "*ep_b0"
}
},
{
"datatype": "anat",
"suffix": "acq-epB01000",
"criteria": {
"SequenceName": "*ep_b0_1000"
}
},
{
"datatype": "anat",
"suffix": "acq-epB1000t",
"criteria": {
"SequenceName": "*ep_b1000t"
}
},
{
"datatype": "anat",
"suffix": "acq-cor_T1w",
"criteria": {
"SequenceName": "*h2d1_205",
"ImageOrientationPatientDICOM": [1,0,0,0,0,-1]
}
},
{
"datatype": "anat",
"suffix": "acq-sag_T1w",
"criteria": {
"SequenceName": "*h2d1_205",
"ImageOrientationPatientDICOM": [0,1,0,0,0,-1]
}
},
{
"datatype": "anat",
"suffix": "acq-ax_T1w",
"criteria": {
"SequenceName": "*h2d1_205",
"ImageOrientationPatientDICOM": [1,0,0,0,1,0]
}
}
]
}
This should cover every-case in the dataset.
Only the following files were not transfered because they didn't look relevant:
Feedback on the chosen conventions would be appreciated.
The files contained in the 4 folders (Karolinska_data.1
, Karolinska_data.2
, Karolinska_data.3
and `Karolinska_data.4) were bidsified using the following line of code:
bids_karo/code/convert_dcm2bids.sh bids_karo/code/dcm2bids_config.json bids_karo/ 20200612_longitudinal/Karolinska_data.1
bids_karo/code/convert_dcm2bids.sh bids_karo/code/dcm2bids_config.json bids_karo/ 20200612_longitudinal/Karolinska_data.2
bids_karo/code/convert_dcm2bids.sh bids_karo/code/dcm2bids_config.json bids_karo/ 20200612_longitudinal/Karolinska_data.3
bids_karo/code/convert_dcm2bids.sh bids_karo/code/dcm2bids_config.json bids_karo/ 20200612_longitudinal/Karolinska_data.4
The metadata was added using the file code/add_dataset_metadata.py
which takes data from the file 20200612_longitudinal/Karolinska_data_exported_2020.06.12\ .xlsx
.
Everything is done and stored on /home/GRAMES.POLYMTL.CA/p119007/create_karo_gitannex/bids_karo
.
Waiting for review of the conventions and the creation of the git-annex repo.
I created the repo and gave @plbenveniste write access: https://data.neuro.polymtl.ca/datasets/ms-karolinska-2020
Some modifications were done in the .json
file to make sure that DWI files are stored under /dwi
and not /anat
.
Also, the localizers were done for T1w images, therefore the contrast was added in the file name.
The data was copied from the folder on romane to the git-annex folder using the following command :
cp -a bids-karo/. ms-karolinska-2020/
Useless files were removed (such as tmpdcm2bids). It was commited and then pushed to the remote branch.
Now ready for review!
I left some review comments on the pull request: https://data.neuro.polymtl.ca/datasets/ms-karolinska-2020/pulls/1
The data is stored in duke:mri/karo/20200612_longitudinal The steps are the following:
@jcohenadad What name should be used for the git-annex repo ? ms-karolinska ?
@mguaypaq Could you create the corresponding git-annex repo ?
This is related to issue 76. Creating this issue here to centralize the work on MS dataset.