neuropoly / data-management

Repo that deals with datalad aspects for internal use
4 stars 0 forks source link

Cleanups of `datasets/canproco` #187

Closed valosekj closed 1 year ago

valosekj commented 1 year ago

During the branch merge to datasets/canproco, @mguaypaq discovered several points which should be addressed. Original comment: https://github.com/neuropoly/data-management/issues/186#issuecomment-1334212012 I do each cleanup in a separate commit within the jv/canproco_cleanup branch (originally named jv/add_README)

Checklist:

UPDATE:

valosekj commented 1 year ago

@mguaypaq - Regarding participants.tsv, I have done the following in 1323152d8a5f004d6dd3c1b4e0ea3a74de7daf04 commit:

Remove extra space character (before the tab character) after participant_id on the first line. Replace empty cells by n/a. Remove the Researcher column (it was entirely empty).

Update: And in 967af431df49e4c4b1d28be6d8bc65b0b61682ef commit I removed the Researcher also from participants.json.

mguaypaq commented 1 year ago

Alright, I merged the jv/canproco_cleanup branch into master, and now bids-validator . is much happier. Is it normal that sub-cal209 is missing the T2w image, though?

valosekj commented 1 year ago

Thank you!

Is it normal that sub-cal209 is missing the T2w image, though?

Yes, see exlude.yml. We could actually add this image to .bidsignore to satisfy bids-validator, couldn't we?

mguaypaq commented 1 year ago

I've added a README note that:

Note: some subjects have been excluded from analysis. See the file `.bidsignore` for the list.

I've added the file .bidsignore with the following contents:

# MTS, STIR, PSIR are not yet in the BIDS standard
*_MTS.*
*_STIR.*
*_PSIR.*

# See the following link for an up-to-date list of excluded data
# https://github.com/ivadomed/canproco/blob/main/etc/exclude.yml

# Poor data quality
sub-cal088/ses-M0

# Missing T2w file
sub-cal209/ses-M0

Also a file .bids-validator-config.json with the following contents:

{"ignore": ["INCONSISTENT_PARAMETERS", "NO_AUTHORS"]}

It looks like the participants with age 89 or higher is maybe wrong data, so @valosekj will investigate.

valosekj commented 1 year ago

I added pathology (MS - multiple sclerosis or HC - healthy control) and phenotype (n/a for healthy controls) columns to the participants.tsv in a3b26aca4d6f2896769a41f2624d50f0faf42594 commit. I also sorted participants.tsv file by centres (easier to read).

mguaypaq commented 1 year ago

Merged into master. Now there's only the age issue left to fix.

valosekj commented 1 year ago

I fixed the pathology (from MS to HC) and phenotype (from RRMS to n/a) columns for sub-mon027 in b306bea1c533ab11b58c779d1cfbf0dc65a3b858 commit within jv/canproco_cleanup branch. @mguaypaq could you please do a merge to master?

Context here

mguaypaq commented 1 year ago

Merged into master.

mguaypaq commented 1 year ago

One more issue to fix: now that participants.tsv contains the columns pathology and phenotype, there should be corresponding entries in participants.json to describe them.

valosekj commented 1 year ago

One more issue to fix: now that participants.tsv contains the columns pathology and phenotype, there should be corresponding entries in participants.json to describe them.

Thank you! pathology and phenotype entries added in 36b04034dbe65094a84042782d7a15405a6817da commit in jv/canproco_cleanup branch.

mguaypaq commented 1 year ago

Merged into master, thanks!

valosekj commented 1 year ago

I updated the age and sex within the jv/update_age_and_sex branch.

@mguaypaq could you please check the branch and merge?

I also noticed that subjects removed from git-annex in https://github.com/neuropoly/data-management/issues/210 are still presented in participants.tsv. We should probably delete these subjects also from participants.tsv.

mguaypaq commented 1 year ago

Great! Finally bids-validator is happy :heavy_check_mark:

I merged your branch into master, and as you suggested, I added a commit to remove these participants from participants.tsv and from derivatives/labels/:

3e5fb7d92 remove sub-edm165 - enrollment error
6ae09e59f remove sub-mon033 - misdiagnosis
1df8d502f remove sub-mon066 - wrong Participant ID
c97358943 remove sub-cal155 - screen fail, was deemed ineligible due to their EDSS score
daf6773e4 remove sub-cal091 - mislabelled scan, should be sub-cal191