New dataset `data_ms` - Githubissues

plbenveniste commented 1 year ago

I want to use the dataset data_ms used to train sct_deepseg_lesion for training a larger model on more contrasts to segment MS lesions on the spinal cord.

The dataset car currently found here duke/projects/ms_seg/seg_paper/data_ms : as detailed in this repository

Maybe the dataset should be renamed ? What name ?

To do :

[x] conversion of the dataset to BIDS format
[ ] verification of the conversion
[ ] upload on git annex
[ ] final verification

jcohenadad commented 1 year ago

Also relevant: https://github.com/spinalcordtoolbox/deepseg_lesion_models/issues/2

It would be nice to move everything to git-annex -- if it is not already on git-annex, given that part (if not all!) of this dataset is under git-annex/sct-testing-large.

Also tagging @valosekj @naga-karthik who might be able to clarify

plbenveniste commented 9 months ago

Observations from the Bidsification of the dataset

The Bidsification was done using the dataset data_ms stored in duke/projects/ms_seg/seg_paper/data_ms and the file dataset.pkl.

Dataset name selection:

[ ] Should the name be changed since all participants don't have ms and some have "ALS", "DCM", "HC", "MS", "NMO", "SCI", "SYR" ?
[ ] The name data_ms is good.

During the transformation of the dataset, I encountered a few issues:

There are a lot of different sites (I grouped them into 28 sites : "amu", "bwh", "chb", "dou", "gle", "kar", "kor", "lyo", "mgh", "mil", "nih", "nyu", "nwu", "oxf", "par", "per", "pol", "ren", "she", "twh", "ubc", "ucl", "ucs", "unf", "unk", "van", "xua", "zur")
There is no information on how the images were acquired
Not all subjects are referenced in the dataset.pkl file : therefore, for some subjects we don't know their pathology (which is basically the only information in the pickle file)
For the json sidecars for images, I put the following:
For the json sidecar for maks, I put the following since I don't know how they were generated ("Lesion Segmentation Manual" changes to "Segmantation Manual")

in the participant.tsv file, I added the following information: "participant_id", "pathology", "center" and "original_id" : to be discussed

Total number of subject 683

@jcohenadad and @naga-karthik could you give me feedback since you seem to be aware of how the dataset was built ?

jcohenadad commented 9 months ago

@plbenveniste before answering your question, can you please address my comment https://github.com/neuropoly/data-management/issues/264#issuecomment-1728492195:

It would be nice to move everything to git-annex -- if it is not already on git-annex, given that part (if not all!) of this dataset is under git-annex/sct-testing-large.

I hope you did not BIDSified a dataset that was already BIDSified and already moved to git-annex

plbenveniste commented 9 months ago

I did look into the datasets on Git-Annex and I didn't see a dataset matching data-ms. Afterward, I discussed it with @valosekj which confirmed that data-ms needed to be BIDSified (maybe I misunderstood). However, now I realize that sct-testing-large includes a lot (if not all) of subjects from data-ms.

Had data-ms been BIDSified already ? If so, by who and how ?
Do we want to have a BIDSified version of data-ms ? If no, should we remove data-ms from duke ?

Tagging @naga-karthik as well for input on what had been done

jcohenadad commented 9 months ago

Had data-ms been BIDSified already ?

yes, partly or entirely, as I said here https://github.com/neuropoly/data-management/issues/264#issuecomment-1728492195

If so, by who

I think Alex Foias, Charley Gros were the ones working on this. For more information about the generation of sct-testing-large, the label-based search is useful: https://github.com/neuropoly/data-management/issues?q=is%3Aopen+is%3Aissue+label%3A%22dataset%3A+sct-testing-large%22 (although it does not cover the time before we ported these discussions on github)

and how ?

Using scripts. Some of these scripts have been improved/revamped and put here: https://github.com/neuropoly/data-management/tree/master/scripts

I went through the README of the related project and found additional information.

Do we want to have a BIDSified version of data-ms ?

Yes, but it seems like we already have, at least in part. We need to make sure that all data from data-ms have been BIDSified.

If no, should we remove data-ms from duke ?

I would say "If yes, should we remove...". And the answer is yes, probably, but we need to sit down and make sure this will not impact the reproducibility of old studies (which codes are based on a specific data structure).

naga-karthik commented 9 months ago

NOTE: This comment contains some important information. Please take the time to read it carefully!

Okay, I have found some evidence that the data-ms dataset on duke might have been BIDSified.

Names of folders under `data-ms`

Contents of column `data_id` in `participants.tsv` of `sct-testing-large` on git-annex

Since the folder names under duke/projects/ms_seg/seg_paper/data_ms seems to match the data_id in participants.tsv of sct-testing-large, it appears that the dataset might have been BIDSified.

For a few subjects that I quickly checked, there also lesion masks under derivatives/labels of sct-testing-large. For example, for amu_2017-virginie* set of folders under duke/projects/ms_seg/seg_paper/data_ms, we have the following set of folders sub-amuVirginie0* (note the different name) under the derivatives:

some lesion derivatives

So, what does this tell us? --> we need to confirm whether all subjects under duke/projects/ms_seg/seg_paper/data_ms have successfully BIDSified and then this dataset can be deleted.

@plbenveniste There are two things:

Could you please write a script that extracts the data_id column from participants.tsv of sct-testing-large and compare whether these subjects match the subjects existing under duke? This should tell us whether the dataset has been successfully BIDSified.
Second check would be if there are lesion and SC seg files as well.

If it turns out that this it has been BIDSified already, then that would be absolutely great! This dataset is very valuable and can/will be used in several of our projects!

plbenveniste commented 9 months ago

Thank you for both your input !

After investigation to compare subjects in data-ms on duke and data_id in sct-testing-large, I found that only the followings subjects are not included in sct-testing-large (14 out of 683 subjects in data-ms):

rennes_20170112_29
rennes_20170112_17
rennes_20170112_66
rennes_20170112_59
rennes_20170112_15
rennes_20170112_13
rennes_20170112_14
rennes_20170112_31
rennes_20170112_38
rennes_20170112_07
rennes_20170112_53
rennes_20170112_65
rennes_20170112_08
rennes_20170112_55

If we decide on adding them to sct-testing-large, I am still missing information for the json file.

Should we copy information from similar files from the same site ?
Should we leave the json file "empty"

jcohenadad commented 9 months ago

Should we copy information from similar files from the same site ?

yes. Thank you @plbenveniste

plbenveniste commented 9 months ago

@mguaypaq Could you give me writing rights for the sct-testing-large dataset please ?

plbenveniste commented 9 months ago

I just saw that the json sidecar for _seg-manual are :

{    "Author": "Charley Gros", 
      "Label": "seg_manual", }

→ Keeping them this way to match the format of the dataset

Also, json sidecar for _lesion-manual are empty : → Keeping them this way to match the format of the dataset

Also noting here: I saw that a lot of subjects in sct-testing-large have GM segmentation file (which were not in data-ms).

mguaypaq commented 9 months ago

@mguaypaq Could you give me writing rights for the sct-testing-large dataset please ?

@plbenveniste, done, you should now be able to push non-master branches on sct-testing-large.

plbenveniste commented 9 months ago

Changes pushed to branch plb/add_missing_data_ms_subject. Changes include :

code : add_missing_subject_data_ms.py
update of the participants.tsv file
added the following subjects from data-ms : rennes_20170112_29 rennes_20170112_17 rennes_20170112_66 rennes_20170112_59 rennes_20170112_15 rennes_20170112_13 rennes_20170112_14 rennes_20170112_31 rennes_20170112_38 rennes_20170112_07 rennes_20170112_53 rennes_20170112_65 rennes_20170112_08 rennes_20170112_55

Ready for review now @mguaypaq

mguaypaq commented 9 months ago

While I was at it:

I cleaned up some old branches that were already merged into master.
I marked the repositories other than data.neuro.polymtl.ca as dead.

Then I noticed that the new image files were not properly annexed, so I started fixing this. But while doing this, I noticed some strange things, and started digging. In particular, I noticed that the following two files, which should not be the same, are byte-for-byte identical:

derivatives/labels/sub-rennesMS074/anat/sub-rennesMS074_acq-inf_T2star_lesion-manual.nii.gz
derivatives/labels/sub-montpellierLesion007/anat/sub-montpellierLesion007_acq-inf_T2star_lesion-manual.nii.gz

Looking at this file in FSLeyes, it looks like a normal lesion mask, with several non-zero voxels, so I don't think it's a case of two empty files or two files with a single voxel being the same.

I think I will have to dig a lot more to see what's going on, and which files are affected. But if anyone has ideas about what's going on, it might go faster.

jcohenadad commented 9 months ago

Thank you sooo much for doing these in-depth checks. If there are duplications and/or wrong file names for images or segmentation, this is definitely problematic for our analyses. @plbenveniste would you mind checking this? Thanks!

plbenveniste commented 9 months ago

I think I found out where the problem came from. I don't know exactly how it happened, but my code successfully copied the images in sct-testing-large, but after copying the images they were replaced by some images from Montpellier. I had discussed this with @mguaypaq when I had the following error message for a lot of files after running my code : [5:55](https://neuropoly.slack.com/archives/D05S39EGSEL/p1702400133530049) git-annex: git status will show ./derivatives/labels/sub-montpellierLesion007/anat/sub-montpellierLesion007_acq-inf_T2star_lesion-manual.nii.gz to be modified, since content availability has changed and git-annex was unable to update the index. This is only a cosmetic problem affecting git status; git add, git commit, etc won't be affected. To fix the git status display, you can run: git-annex restage

I ran git-annex restage to fix it and I think that's how the images were replaced.

Currently, re-doing the modification to look into this git-annex issue

plbenveniste commented 9 months ago

After carefully redoing the entire process, I looked into the similarity between the montpellier files and the files which I was adding. I wrote a script to compare each file which I added to sct-testing-large to the files of the sub-montpellier subjects. The script came up with the following conclusions (saying if a file is empty or identical to one file from montpellier):

Code results

- sub-rennesMS074_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion007_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS074_acq-inf_T2star.nii.gz and sub-montpellierLesion007_acq-inf_T2star.nii.gz are identical - sub-rennesMS074_acq-sup_T2star.nii.gz and sub-montpellierLesion007_acq-sup_T2star.nii.gz are identical - sub-rennesMS074_acq-sagthor_T2w.nii.gz and sub-montpellierLesion007_acq-sagthor_T2w.nii.gz are identical - sub-rennesMS074_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion007_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS074_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion007_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS074_acq-sup_T2star_lesion-manual.nii.gz and sub-montpellierLesion007_acq-sup_T2star_lesion-manual.nii.gz are identical - sub-rennesMS074_acq-sagthor_T2w_lesion-manual.nii.gz and sub-montpellierLesion007_acq-sagthor_T2w_lesion-manual.nii.gz are identical - sub-rennesMS074_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion007_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS074_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion007_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS074_acq-inf_T2star_lesion-manual.nii.gz and sub-montpellierLesion007_acq-inf_T2star_lesion-manual.nii.gz are identical - sub-rennesMS074_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion007_acq-sagcerv_T2w_lesion-manual.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS074--------------------------------------------- - sub-rennesMS075_acq-sup_T2star.nii.gz and sub-montpellierLesion006_acq-sup_T2star.nii.gz are identical - sub-rennesMS075_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion006_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS075_acq-sagthor_T2w.nii.gz and sub-montpellierLesion006_acq-sagthor_T2w.nii.gz are identical - sub-rennesMS075_acq-inf_T2star.nii.gz and sub-montpellierLesion006_acq-inf_T2star.nii.gz are identical - sub-rennesMS075_acq-inf_T2star_lesion-manual.nii.gz and sub-montpellierLesion006_acq-inf_T2star_lesion-manual.nii.gz are identical - sub-rennesMS075_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion006_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS075_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion006_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS075_acq-sagthor_T2w_lesion-manual.nii.gz and sub-montpellierLesion006_acq-sagthor_T2w_lesion-manual.nii.gz are identical - sub-rennesMS075_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion006_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS075_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion006_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS075_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion006_acq-sagcerv_T2w_lesion-manual.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS075--------------------------------------------- - sub-rennesMS076_acq-inf_T2star_lesion-manual.nii.gz and sub-montpellierLesion014_acq-inf_T2star_lesion-manual.nii.gz are identical - sub-rennesMS076_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion014_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS076_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion014_acq-sagcerv_T2w_lesion-manual.nii.gz are identical - sub-rennesMS076_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion014_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS076_acq-sup_T2star_lesion-manual.nii.gz and sub-montpellierLesion014_acq-sup_T2star_lesion-manual.nii.gz are identical - sub-rennesMS076_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion014_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS076_acq-inf_T2star.nii.gz and sub-montpellierLesion014_acq-inf_T2star.nii.gz are identical - sub-rennesMS076_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion014_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS076_acq-sup_T2star.nii.gz and sub-montpellierLesion014_acq-sup_T2star.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS076------------------------------------------- - sub-rennesMS077_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion012_acq-sagcerv_T2w_lesion-manual.nii.gz are identical - sub-rennesMS077_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion012_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS077_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion012_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS077_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion012_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS077_acq-sup_T2star.nii.gz and sub-montpellierLesion012_acq-sup_T2star.nii.gz are identical - sub-rennesMS077_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion012_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS077_acq-inf_T2star.nii.gz and sub-montpellierLesion012_acq-inf_T2star.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS077-------------------------------------------- - sub-rennesMS078_acq-sup_T2star_lesion-manual.nii.gz is empty - sub-rennesMS078_acq-sagthor_T2w_lesion-manual.nii.gz is empty - sub-rennesMS078_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion005_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS078_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion005_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS078_acq-sagcerv_T2w_lesion-manual.nii.gz is empty - sub-rennesMS078_acq-inf_T2star_lesion-manual.nii.gz is empty - sub-rennesMS078_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion005_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS078_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion005_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS078_acq-inf_T2star.nii.gz and sub-montpellierLesion005_acq-inf_T2star.nii.gz are identical - sub-rennesMS078_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion005_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS078_acq-sagthor_T2w.nii.gz and sub-montpellierLesion005_acq-sagthor_T2w.nii.gz are identical - sub-rennesMS078_acq-sup_T2star.nii.gz and sub-montpellierLesion005_acq-sup_T2star.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS078------------------------------------------- - sub-rennesMS079_acq-sagthor_T2w_lesion-manual.nii.gz and sub-montpellierLesion003_acq-sagthor_T2w_lesion-manual.nii.gz are identical - sub-rennesMS079_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion003_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS079_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion003_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS079_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion003_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS079_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion003_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS079_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion003_acq-sagcerv_T2w_lesion-manual.nii.gz are identical - sub-rennesMS079_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion003_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS079_acq-sup_T2star.nii.gz and sub-montpellierLesion003_acq-sup_T2star.nii.gz are identical - sub-rennesMS079_acq-inf_T2star.nii.gz and sub-montpellierLesion003_acq-inf_T2star.nii.gz are identical - sub-rennesMS079_acq-sagthor_T2w.nii.gz and sub-montpellierLesion003_acq-sagthor_T2w.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS079--------------------------------------------- - sub-rennesMS080_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion004_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS080_acq-inf_T2star.nii.gz and sub-montpellierLesion004_acq-inf_T2star.nii.gz are identical - sub-rennesMS080_acq-sup_T2star.nii.gz and sub-montpellierLesion004_acq-sup_T2star.nii.gz are identical - sub-rennesMS080_acq-sagthor_T2w.nii.gz and sub-montpellierLesion004_acq-sagthor_T2w.nii.gz are identical - sub-rennesMS080_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion004_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS080_acq-sagthor_T2w_lesion-manual.nii.gz is empty - sub-rennesMS080_acq-sup_T2star_lesion-manual.nii.gz is empty - sub-rennesMS080_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion004_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS080_acq-sagcerv_T2w_lesion-manual.nii.gz is empty - sub-rennesMS080_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion004_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS080_acq-inf_T2star_lesion-manual.nii.gz is empty - sub-rennesMS080_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion004_acq-sagthor_T2w_seg-manual.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS080--------------------------------------------- - sub-rennesMS081_acq-sup_T2star.nii.gz and sub-montpellierLesion008_acq-sup_T2star.nii.gz are identical - sub-rennesMS081_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion008_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS081_acq-sagthor_T2w.nii.gz and sub-montpellierLesion008_acq-sagthor_T2w.nii.gz are identical - sub-rennesMS081_acq-inf_T2star.nii.gz and sub-montpellierLesion008_acq-inf_T2star.nii.gz are identical - sub-rennesMS081_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion008_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS081_acq-sagthor_T2w_lesion-manual.nii.gz and sub-montpellierLesion008_acq-sagthor_T2w_lesion-manual.nii.gz are identical - sub-rennesMS081_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion008_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS081_acq-inf_T2star_lesion-manual.nii.gz and sub-montpellierLesion008_acq-inf_T2star_lesion-manual.nii.gz are identical - sub-rennesMS081_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion008_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS081_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion008_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS081_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion008_acq-sagcerv_T2w_lesion-manual.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS081---------------------------------------------- - sub-rennesMS082_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion009_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS082_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion009_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS082_acq-inf_T2star_lesion-manual.nii.gz and sub-montpellierLesion009_acq-inf_T2star_lesion-manual.nii.gz are identical - sub-rennesMS082_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion009_acq-sagcerv_T2w_lesion-manual.nii.gz are identical - sub-rennesMS082_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion009_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS082_acq-sagthor_T2w_lesion-manual.nii.gz and sub-montpellierLesion009_acq-sagthor_T2w_lesion-manual.nii.gz are identical - sub-rennesMS082_acq-sup_T2star_lesion-manual.nii.gz is empty - sub-rennesMS082_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion009_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS082_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion009_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS082_acq-inf_T2star.nii.gz and sub-montpellierLesion009_acq-inf_T2star.nii.gz are identical - sub-rennesMS082_acq-sup_T2star.nii.gz and sub-montpellierLesion009_acq-sup_T2star.nii.gz are identical - sub-rennesMS082_acq-sagthor_T2w.nii.gz and sub-montpellierLesion009_acq-sagthor_T2w.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS082--------------------------------------------- - sub-rennesMS083_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion001_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS083_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion001_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS083_acq-sagcerv_T2w_lesion-manual.nii.gz is empty - sub-rennesMS083_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion001_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS083_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion001_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS083_acq-sagthor_T2w_lesion-manual.nii.gz and sub-montpellierLesion001_acq-sagthor_T2w_lesion-manual.nii.gz are identical - sub-rennesMS083_acq-inf_T2star_lesion-manual.nii.gz is empty - sub-rennesMS083_acq-sup_T2star.nii.gz and sub-montpellierLesion001_acq-sup_T2star.nii.gz are identical - sub-rennesMS083_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion001_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS083_acq-inf_T2star.nii.gz and sub-montpellierLesion001_acq-inf_T2star.nii.gz are identical - sub-rennesMS083_acq-sagthor_T2w.nii.gz and sub-montpellierLesion001_acq-sagthor_T2w.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS083------------------------------------------- - sub-rennesMS084_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion010_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS084_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion010_acq-sagcerv_T2w_lesion-manual.nii.gz are identical - sub-rennesMS084_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion010_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS084_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion010_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS084_acq-inf_T2star.nii.gz and sub-montpellierLesion010_acq-inf_T2star.nii.gz are identical - sub-rennesMS084_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion010_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS084_acq-sup_T2star.nii.gz and sub-montpellierLesion010_acq-sup_T2star.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS084--------------------------------------------- - sub-rennesMS085_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion013_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS085_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion013_acq-sagcerv_T2w_lesion-manual.nii.gz are identical - sub-rennesMS085_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion013_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS085_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion013_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS085_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion013_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS085_acq-sup_T2star.nii.gz and sub-montpellierLesion013_acq-sup_T2star.nii.gz are identical - sub-rennesMS085_acq-inf_T2star.nii.gz and sub-montpellierLesion013_acq-inf_T2star.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS085------------------------------------------------- - sub-rennesMS086_acq-inf_T2star.nii.gz and sub-montpellierLesion002_acq-inf_T2star.nii.gz are identical - sub-rennesMS086_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion002_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS086_acq-sagthor_T2w.nii.gz and sub-montpellierLesion002_acq-sagthor_T2w.nii.gz are identical - sub-rennesMS086_acq-sup_T2star.nii.gz and sub-montpellierLesion002_acq-sup_T2star.nii.gz are identical - sub-rennesMS086_acq-sup_T2star_lesion-manual.nii.gz is empty - sub-rennesMS086_acq-sagthor_T2w_lesion-manual.nii.gz is empty - sub-rennesMS086_acq-sagthor_T2w_seg-manual.nii.gz and sub-montpellierLesion002_acq-sagthor_T2w_seg-manual.nii.gz are identical - sub-rennesMS086_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion002_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS086_acq-inf_T2star_lesion-manual.nii.gz is empty - sub-rennesMS086_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion002_acq-sagcerv_T2w_seg-manual.nii.gz are identical - sub-rennesMS086_acq-sagcerv_T2w_lesion-manual.nii.gz is empty - sub-rennesMS086_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion002_acq-sup_T2star_seg-manual.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS086------------------------------------------ - sub-rennesMS087_acq-sagcerv_T2w.nii.gz and sub-montpellierLesion011_acq-sagcerv_T2w.nii.gz are identical - sub-rennesMS087_acq-sup_T2star.nii.gz and sub-montpellierLesion011_acq-sup_T2star.nii.gz are identical - sub-rennesMS087_acq-inf_T2star.nii.gz and sub-montpellierLesion011_acq-inf_T2star.nii.gz are identical - sub-rennesMS087_acq-sup_T2star_seg-manual.nii.gz and sub-montpellierLesion011_acq-sup_T2star_seg-manual.nii.gz are identical - sub-rennesMS087_acq-sup_T2star_lesion-manual.nii.gz is empty - sub-rennesMS087_acq-sagcerv_T2w_lesion-manual.nii.gz and sub-montpellierLesion011_acq-sagcerv_T2w_lesion-manual.nii.gz are identical - sub-rennesMS087_acq-inf_T2star_seg-manual.nii.gz and sub-montpellierLesion011_acq-inf_T2star_seg-manual.nii.gz are identical - sub-rennesMS087_acq-sagcerv_T2w_seg-manual.nii.gz and sub-montpellierLesion011_acq-sagcerv_T2w_seg-manual.nii.gz are identical --------------------------------- Done with subject : sub-rennesMS087--------------------------------------------

Interestingly enough, it seems that each file matches with the file of exactly another subject.

Furthermore, interestingly enough, there are only 14 montpellier subjects and only 14 subjects missing from data-ms. Here are the information stored in the participants.tsv:

sub-montpellierLesion001 F unknown unknown MS montpellier_20170112_07 montpellierLesion sub-montpellierLesion002 F unknown unknown MS montpellier_20170112_08 montpellierLesion sub-montpellierLesion003 F unknown unknown MS montpellier_20170112_13 montpellierLesion sub-montpellierLesion004 F unknown unknown MS montpellier_20170112_14 montpellierLesion sub-montpellierLesion005 F unknown unknown MS montpellier_20170112_15 montpellierLesion sub-montpellierLesion006 F unknown unknown MS montpellier_20170112_17 montpellierLesion sub-montpellierLesion007 F unknown unknown MS montpellier_20170112_29 montpellierLesion sub-montpellierLesion008 M unknown unknown MS montpellier_20170112_31 montpellierLesion sub-montpellierLesion009 M unknown unknown MS montpellier_20170112_38 montpellierLesion sub-montpellierLesion010 F unknown unknown MS montpellier_20170112_53 montpellierLesion sub-montpellierLesion011 F unknown unknown MS montpellier_20170112_55 montpellierLesion sub-montpellierLesion012 M unknown unknown MS montpellier_20170112_59 montpellierLesion sub-montpellierLesion013 F unknown unknown MS montpellier_20170112_65 montpellierLesion sub-montpellierLesion014 M unknown unknown MS montpellier_20170112_66 montpellierLesion

Here are the information I added in the participants.tsv:

sub-rennesMS074 unknown unknown MS rennes_20170112_29 rennesMS sub-rennesMS075 unknown unknown MS rennes_20170112_17 rennesMS sub-rennesMS076 unknown unknown MS rennes_20170112_66 rennesMS sub-rennesMS077 unknown unknown MS rennes_20170112_59 rennesMS sub-rennesMS078 unknown unknown MS rennes_20170112_15 rennesMS sub-rennesMS079 unknown unknown MS rennes_20170112_13 rennesMS sub-rennesMS080 unknown unknown MS rennes_20170112_14 rennesMS sub-rennesMS081 unknown unknown MS rennes_20170112_31 rennesMS sub-rennesMS082 unknown unknown MS rennes_20170112_38 rennesMS sub-rennesMS083 unknown unknown MS rennes_20170112_07 rennesMS sub-rennesMS084 unknown unknown MS rennes_20170112_53 rennesMS sub-rennesMS085 unknown unknown MS rennes_20170112_65 rennesMS sub-rennesMS086 unknown unknown MS rennes_20170112_08 rennesMS sub-rennesMS087 unknown unknown MS rennes_20170112_55 rennesMS

Also, the description dataset from data-ms doesn't give the sex of each subject : I don't know where that information comes from.

Further work: What solution are we chosing ?

[ ] Should we replace the subjects from montpellier by the subjects I added ?
[ ] Should we keep the subjects from Montpellier ?

jcohenadad commented 9 months ago

Thank you for working on making our database more reliable @plbenveniste 🙏

plbenveniste commented 8 months ago

Hi @jcohenadad ! I was wondering which solution (of the options listed above) we were choosing ? Looking into this to close the issue.

jcohenadad commented 8 months ago

So, if I understand correctly the issue, we don't know if these 14 subjects are from Rennes or from Montpellier, is that correct?

plbenveniste commented 8 months ago

Based on the dataset.pkl file the images come from Rennes. But there is no way of knowing for sure which is true...

jcohenadad commented 8 months ago

hum... ok so let's label them as Rennes and get rid of the Montpellier ones

plbenveniste commented 8 months ago

After more consideration, I would suggest keeping the subjects from Montpellier : i.e. not changing anything. The reason is that the Montpellier subjects have more files than the corresponding Rennes subject :

For example, for sub-montpellierLesion014 the files are :

sub-montpellierLesion014_acq-inf_T2star.json
sub-montpellierLesion014_acq-inf_T2star.nii.gz
sub-montpellierLesion014_acq-sagcerv_T2w.json
sub-montpellierLesion014_acq-sagcerv_T2w.nii.gz
sub-montpellierLesion014_acq-sagthor_T2w.json
sub-montpellierLesion014_acq-sagthor_T2w.nii.gz
sub-montpellierLesion014_acq-sup_T2star.json
sub-montpellierLesion014_acq-sup_T2star.nii.gz

The corresponding subject sub-rennesMS076the files are :

sub-rennesMS076_acq-inf_T2star.json
sub-rennesMS076_acq-inf_T2star.nii.gz
sub-rennesMS076_acq-sagcerv_T2w.json
sub-rennesMS076_acq-sagcerv_T2w.nii.gz
sub-rennesMS076_acq-sup_T2star.json
sub-rennesMS076_acq-sup_T2star.nii.gz

There is an additional file for the T2w image in the Montpellier folder.

Also for the derivatives :

sub-montpellierLesion014_acq-inf_T2star_lesion-manual.json
sub-montpellierLesion014_acq-inf_T2star_lesion-manual.nii.gz
sub-montpellierLesion014_acq-inf_T2star_seg-manual.json
sub-montpellierLesion014_acq-inf_T2star_seg-manual.nii.gz
sub-montpellierLesion014_acq-sagcerv_T2w_labels-disc-manual.json
sub-montpellierLesion014_acq-sagcerv_T2w_labels-disc-manual.nii.gz
sub-montpellierLesion014_acq-sagcerv_T2w_lesion-manual.json
sub-montpellierLesion014_acq-sagcerv_T2w_lesion-manual.nii.gz
sub-montpellierLesion014_acq-sagcerv_T2w_seg-manual.json
sub-montpellierLesion014_acq-sagcerv_T2w_seg-manual.nii.gz
sub-montpellierLesion014_acq-sagthor_T2w_labels-disc-manual.json
sub-montpellierLesion014_acq-sagthor_T2w_labels-disc-manual.nii.gz
sub-montpellierLesion014_acq-sagthor_T2w_lesion-manual.json
sub-montpellierLesion014_acq-sagthor_T2w_lesion-manual.nii.gz
sub-montpellierLesion014_acq-sup_T2star_lesion-manual.json
sub-montpellierLesion014_acq-sup_T2star_lesion-manual.nii.gz
sub-montpellierLesion014_acq-sup_T2star_seg-manual.json
sub-montpellierLesion014_acq-sup_T2star_seg-manual.nii.gz And for the Rennes subject :
sub-rennesMS076_acq-inf_T2star_lesion-manual.json
sub-rennesMS076_acq-inf_T2star_seg-manual.nii.gz
sub-rennesMS076_acq-sagcerv_T2w_seg-manual.json
sub-rennesMS076_acq-sup_T2star_lesion-manual.nii.gz
sub-rennesMS076_acq-inf_T2star_lesion-manual.nii.gz
sub-rennesMS076_acq-sagcerv_T2w_lesion-manual.json
sub-rennesMS076_acq-sagcerv_T2w_seg-manual.nii.gz
sub-rennesMS076_acq-sup_T2star_seg-manual.json
sub-rennesMS076_acq-inf_T2star_seg-manual.json
sub-rennesMS076_acq-sagcerv_T2w_lesion-manual.nii.gz
sub-rennesMS076_acq-sup_T2star_lesion-manual.json
sub-rennesMS076_acq-sup_T2star_seg-manual.nii.gz

Again, some label files are not present in the rennes subject folder.

plbenveniste commented 8 months ago

Thank you @jcohenadad for your feedback. Before closing this issue, @mguaypaq could you delete the remote branches : plb/add_missing_data_ms_subjects and plb/add_missing_data_ms_subjects_2. Thanks

mguaypaq commented 8 months ago

Done! What an adventure.

neuropoly / data-management

New dataset `data_ms` #264