spine-generic / data-multi-subject_DO-NOT-USE

Multi-subject data for the Spine Generic project
0 stars 1 forks source link

Add dataset fslAchieva & fslPrisma #21

Closed alexfoias closed 4 years ago

alexfoias commented 4 years ago

CHANGES:

Note: sapienza subjects mentioned in spine-generic pipeline have to be renamed to fslAchieva. It's the case in https://github.com/spine-generic/spine-generic/blob/1329c37e40569ab22c37526d2d3ac9eb2b8f23a6/spinegeneric/cli/generate_figure.py

Here is the qc of the complete dataset: qc_dataset.zip

:warning: Note (added by @jcohenadad on 2020-08-11 07:15:47): This PR originally included both the renaming of sapienza data and the fix of the defacing of some data. The latter issue has now been moved to a specific PR https://github.com/spine-generic/data-multi-subject/pull/19, but this explains the presence of comments regarding the defacing of other data in the present PR.

alexfoias commented 4 years ago

@PaulBautin Could you please check the qc to make sure that all images are okay ? A zip file is on the top of the PR. Thanks.

PaulBautin commented 4 years ago

@PaulBautin Could you please check the qc to make sure that all images are okay ? A zip file is on the top of the PR. Thanks.

Should i look if the brain is defaced? The SC looks intact, and everything looks good.

alexfoias commented 4 years ago

Yes, Could you please check if it's properly defaced (brain and/or spine not chopped) ? Thanks.

PaulBautin commented 4 years ago

On second review (tell me what you think of these errors): On these images the defacing seems to be touching the upper part of the spinal cord image image

On these images the defacing seems to be chopping a part of the brain: image image image

jcohenadad commented 4 years ago

indeed, these need to be fixed. thanks @PaulBautin

jcohenadad commented 4 years ago

@alexfoias i'm just realizing this PR is a mix between fixing defacing and adding new data-- please separate the two things for clarity and better tracking recording. Defacing should go in this other PR https://github.com/spine-generic/data-multi-subject/pull/19. Also please move the defacing-specific comments to the other PR.

alexfoias commented 4 years ago

@jcohenadad Sorry for the mix in the changes, but I did the checkout from my master branch after adding the fixed defaced image. I'll fix this PR just to contain the fslAchieva & fslPrisma changes and reopen a new PR for the fixed defaced images.

alexfoias commented 4 years ago

I fixed the PR. Please find the qc here.qc_dataset_20200803.zip

jcohenadad commented 4 years ago

I fixed the PR.

~the branch still contains modifications that are not related to the PR purpose, i.e. added fsl* data:~

The branch shows some files that are modified (i.e. non-zero BIN), despite only name was supposed to have changed.

Screen Shot 2020-08-03 at 10 38 44 AM

alexfoias commented 4 years ago

That was the rename of the derivatives. Do you want me to remove them and only keep the raw data ? Who will take of the derivatives ?

jcohenadad commented 4 years ago

That was the rename of the derivatives

ah! ok that makes sense

jcohenadad commented 4 years ago

https://github.com/spine-generic/data-multi-subject/pull/21#issuecomment-668035623: this PR is about adding fslAchieva & fslPrisma. Nothing more, nothing less.

To validate the QC of defacing, please refer to https://github.com/spine-generic/data-multi-subject/pull/19

alexfoias commented 4 years ago

@jcohenadad I did a test with the same command as I changed the name of derivatives for sapienza:

Here is the command:

import os
old_path = '/Users/alfoi/code/data_fork_multi_subject/derivatives/labels/sub-amu02/anat/sub-amu02_T1w_RPI_r_labels-manual.nii.gz'
new_path = '/Users/alfoi/code/data_fork_multi_subject/derivatives/labels/sub-amu02/anat/sub-amu02_T1w_RPI_r_labels-manual_new.nii.gz'
os.rename(old_path,new_path)

When pushing on the branch it shows as: Screen Shot 2020-08-03 at 12 02 32 PM

I'll delete the amu commit from PR after I have your okay.

jcohenadad commented 4 years ago

b00b64be99a1a632a93179719f2dee4911ec63c2

-rwx------  1 julien  staff  557857 14 Jul 21:22 sub-amu02_T1w_RPI_r_labels-manual.nii.gz

aa3087640f297829c3f631588f37b7ef9e7a111e

-rwxr-xr-x  1 julien  staff  557857  3 Aug 15:36 sub-amu02_T1w_RPI_r_labels-manual_new.nii.gz

as i told you in the past, you should be careful when moving/renaming files to not change their permissions.

some doc about keeping permissions: https://stackoverflow.com/questions/12181938/how-to-preserve-default-umask-if-renaming-a-tempfile

alexfoias commented 4 years ago

There's something happening that I don't understand, because for me I have the same permissions.

Is it probably the problem because I work on a fork ? The commit https://github.com/spine-generic/data-multi-subject/commit/aa3087640f297829c3f631588f37b7ef9e7a111e is from the forked repo.

On my fork (https://github.com/alexfoias/data-multi-subject) everything seems okay and it doesn't seem to change the permissions:

Here is my output for c6f2fb49a9a797dcec121a0eb6487af0aa811739:

(base) alfoi@MacBook-Pro:data_fork_multi_subject % git checkout c6f2fb49a9a797dcec121a0eb6487af0aa811739
(base) alfoi@MacBook-Pro:anat % git rev-parse HEAD
c6f2fb49a9a797dcec121a0eb6487af0aa811739
(base) alfoi@MacBook-Pro:anat % pwd               
/Users/alfoi/code/data_fork_multi_subject/derivatives/labels/sub-amu02/anat
(base) alfoi@MacBook-Pro:anat % ll                                                                             
total 1096
drwxr-xr-x  3 alfoi  staff      96  3 Aug 16:44 .
drwxr-xr-x  3 alfoi  staff      96  3 Aug 16:44 ..
-rwxr-xr-x  1 alfoi  staff  557857  3 Aug 16:44 sub-amu02_T1w_RPI_r_labels-manual.nii.gz

Here is my output for aa3087640f297829c3f631588f37b7ef9e7a111e:

(base) alfoi@MacBook-Pro:anat % git checkout aa3087640f297829c3f631588f37b7ef9e7a111e
(base) alfoi@MacBook-Pro:anat % git rev-parse HEAD
aa3087640f297829c3f631588f37b7ef9e7a111e
(base) alfoi@MacBook-Pro:anat % pwd               
/Users/alfoi/code/data_fork_multi_subject/derivatives/labels/sub-amu02/anat
(base) alfoi@MacBook-Pro:anat % ll
total 1096
drwxr-xr-x  3 alfoi  staff      96  3 Aug 16:46 .
drwxr-xr-x  3 alfoi  staff      96  3 Aug 16:46 ..
-rwxr-xr-x  1 alfoi  staff  557857  3 Aug 16:46 sub-amu02_T1w_RPI_r_labels-manual_new.nii.gz
jcohenadad commented 4 years ago

@kousu do you have any insights?

kousu commented 4 years ago

@jcohenadad Sorry for the mix in the changes, but I did the checkout from my master branch after adding the fixed defaced image. I'll fix this PR just to contain the fslAchieva & fslPrisma changes and reopen a new PR for the fixed defaced images.

Do we need to purge the previous images completely? Were they already defaced, just incorrectly, or were they not defaced at all? If the latter, we should remember they'll still be in the git log even after this PR goes in, git rm isn't enough to hide them. If we need to purge them then we need to do surgery on master to get them out (surgery which will need one extra step after #20 goes in) so I'd like to be involved to hand-hold that procedure.

And I'm really really sorry about this but #20 is going to force you to redo this PR from scratch. Can you save your new files separately from the repo? You'll have to do that then refork and redownload the repo and reapply the work.

kousu commented 4 years ago

b00b64b

-rwx------  1 julien  staff  557857 14 Jul 21:22 sub-amu02_T1w_RPI_r_labels-manual.nii.gz

aa30876

-rwxr-xr-x  1 julien  staff  557857  3 Aug 15:36 sub-amu02_T1w_RPI_r_labels-manual_new.nii.gz

as i told you in the past, you should be careful when moving/renaming files to not change their permissions.

some doc about keeping permissions: https://stackoverflow.com/questions/12181938/how-to-preserve-default-umask-if-renaming-a-tempfile

I see them as 0o0755 like Alex does. I downloaded the repo (it's 21G on disk, ay yi yi) and looked at the permissions changed on this branch for that folder. It does not show any permissions changes:

$ git log --summary b25b78d.. -- derivatives/
commit aa3087640f297829c3f631588f37b7ef9e7a111e (HEAD -> add_fsl_dataset, origin/add_fsl_dataset)
Author: Alexandru Foias <alexandrufoias@gmail.com>
Date:   Mon Aug 3 11:59:50 2020 -0400

    rename test

 rename derivatives/labels/sub-amu02/anat/{sub-amu02_T1w_RPI_r_labels-manual.nii.gz => sub-amu02_T1w_RPI_r_labels-manual_new.nii.gz} (100%)

commit c6f2fb49a9a797dcec121a0eb6487af0aa811739
Merge: 00ccd02 b00b64b
Author: Alexandru Foias <alexandrufoias@gmail.com>
Date:   Mon Aug 3 09:42:52 2020 -0400

    Merge branch 'master' of https://github.com/spine-generic/data-multi-subject into add_fsl_dataset

commit 7a2e9d85c56d8473e98a88ce752712faf9d635f9
Author: Alexandru Foias <alexandrufoias@gmail.com>
Date:   Fri Jul 31 15:10:09 2020 -0400

    rename sapienza to fslAchieva

 rename derivatives/labels/{sub-sapienza01/anat/sub-sapienza01_T1w_RPI_r_labels-manual.nii.gz => sub-fslAchieva01/anat/sub-fslAchieva01_T1w_RPI_r_labels-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza01/anat/sub-sapienza01_T2star_rms_gmseg-manual.nii.gz => sub-fslAchieva01/anat/sub-fslAchieva01_T2star_rms_gmseg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza01/anat/sub-sapienza01_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva01/anat/sub-fslAchieva01_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza01/dwi/sub-sapienza01_dwi_moco_dwi_mean_seg-manual.json => sub-fslAchieva01/dwi/sub-fslAchieva01_dwi_moco_dwi_mean_seg-manual.json} (100%)
 rename derivatives/labels/{sub-sapienza01/dwi/sub-sapienza01_dwi_moco_dwi_mean_seg-manual.nii.gz => sub-fslAchieva01/dwi/sub-fslAchieva01_dwi_moco_dwi_mean_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza03/anat/sub-sapienza03_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva03/anat/sub-fslAchieva03_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza04/anat/sub-sapienza04_T2star_rms_gmseg-manual.nii.gz => sub-fslAchieva04/anat/sub-fslAchieva04_T2star_rms_gmseg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza04/anat/sub-sapienza04_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva04/anat/sub-fslAchieva04_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza05/anat/sub-sapienza05_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva05/anat/sub-fslAchieva05_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza06/anat/sub-sapienza06_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva06/anat/sub-fslAchieva06_acq-T1w_MTS_seg-manual.nii.gz} (100%)

commit d8b444f7e7f956bfee7739cd50f7f2dbe768f404
Author: PaulBautin <pbautin70@gmail.com>
Date:   Fri Jul 31 12:23:37 2020 -0400

    - remove cmrra02
    - remove sapienza04 gmseg
    - correct gmseg for sapienza01

 delete mode 100644 derivatives/labels/sub-cmrra02/anat/sub-cmrra02_T2w_RPI_r_seg-manual.json
 delete mode 100644 derivatives/labels/sub-cmrra02/anat/sub-cmrra02_T2w_RPI_r_seg-manual.nii.gz
 delete mode 100644 derivatives/labels/sub-sapienza04/anat/sub-sapienza04_T2star_rms_gmseg-manual.json
 delete mode 100755 derivatives/labels/sub-sapienza04/anat/sub-sapienza04_T2star_rms_gmseg-manual.nii.gz

On aa3087640f297829c3f631588f37b7ef9e7a111e I see the same as Alex:

$ ls -l derivatives/labels/sub-amu02/anat/sub-amu02_T1w_RPI_r_labels-manual_new.nii.gz 
-rwxr-xr-x 1 kousu kousu 557857 Aug  4 18:05 derivatives/labels/sub-amu02/anat/sub-amu02_T1w_RPI_r_labels-manual_new.nii.gz

on b00b64be99a1a632a93179719f2dee4911ec63c2 -- from master -- I see the same as Alex:

$ git checkout b00b64be99a1a632a93179719f2dee4911ec63c2
HEAD is now at b00b64b Merge pull request #16 from PaulBautin/pb_corrections_t1w_t2w
$ ls -l derivatives/labels/sub-amu02/anat/sub-amu02_T1w_RPI_r_labels-manual.nii.gz 
-rwxr-xr-x 1 kousu kousu 557857 Aug  4 19:24 derivatives/labels/sub-amu02/anat/sub-amu02_T1w_RPI_r_labels-manual.nii.gz

I'm not sure why your system is being strange @jcohenadad. How did you checkout @alexfoias' code locally? You should have done

git add remote alexfoias https://github.com/alexfoias/data-multi-subject.git
git fetch alexfoias
git checkout alexfoias/add_fsl_dataset

Is it possible you did something else?

Maybe you have something unusual in your git config -l that explains it? Are you working on your usual disk, or are you on a different disk that maybe has a different filesystem that is behaving differently?

jcohenadad commented 4 years ago

Do we need to purge the previous images completely? Were they already defaced, just incorrectly, or were they not defaced at all? If the latter, we should remember they'll still be in the git log even after this PR goes in, git rm isn't enough to hide them.

@kousu i've configured the repos to only allow squash and merge (although you told me you changed this behaviour temporarily), which (i think you confirmed in the past) does get rid of committed elements within the branch, right?

I'm not sure why your system is being strange @jcohenadad. How did you checkout @alexfoias' code locally?

i did this (as instructed on this GH page under "Step 1: From your project repository, check out a new branch and test the changes."):

git checkout -b "alexfoias-add_fsl_dataset" "master"
git pull https://github.com/alexfoias/data-multi-subject.git "add_fsl_dataset"

I see them as 0o0755 like Alex does. I downloaded the repo (it's 21G on disk, ay yi yi) and looked at the permissions changed on this branch for that folder. It does not show any permissions changes:

I was basing myself on the change observed on GH, as described in https://github.com/spine-generic/data-multi-subject/pull/21#issuecomment-668060031. At the end, this is what matters, right? Maybe there are two competing effects, which are generating confusion: (1) different permissions on my end (for whatever reasons) and (2) actual changes (not permissions, but changes in the binaries), which explains the non-zero BIN on the github's diff in https://github.com/spine-generic/data-multi-subject/pull/21#issuecomment-668060031. If this is true, the latter is very problematic because files should be the same.

alexfoias commented 4 years ago

@kousu Yes I checked out like that. The exact details are here.

kousu commented 4 years ago

Do we need to purge the previous images completely? Were they already defaced, just incorrectly, or were they not defaced at all? If the latter, we should remember they'll still be in the git log even after this PR goes in, git rm isn't enough to hide them.

@kousu i've configured the repos to only allow squash and merge (although you told me you changed this behaviour temporarily), which (i think you confirmed in the past) does get rid of committed elements within the branch, right?

Squash-merge is a special case of rebasing. It only applies to the code in the PR. If there are non-defaced images on master then you need to do deeper surgery on the history to get rid of them.

But maybe I'm misunderstanding and blowing this out of proportion. Were non-defaced images ever uploaded? Or just incorrectly defaced images? Were any of the incorrectly defaced images effectively non-defaced?

jcohenadad commented 4 years ago

If there are non-defaced images on master then you need to do deeper surgery on the history to get rid of them.

that's not the case, so all is good

EDIT 2020-08-05 12:35:47: "all is good" about the defaced images in the history-- but the issue about diff still remains https://github.com/spine-generic/data-multi-subject/pull/21#issuecomment-668894683

kousu commented 4 years ago

I see them as 0o0755 like Alex does. I downloaded the repo (it's 21G on disk, ay yi yi) and looked at the permissions changed on this branch for that folder. It does not show any permissions changes:

I was basing myself on the change observed on GH, as described in #21 (comment). At the end, this is what matters, right? Maybe there are two competing effects, which are generating confusion: (1) different permissions on my end (for whatever reasons) and (2) actual changes (not permissions, but changes in the binaries), which explains the non-zero BIN on the github's diff in #21 (comment). If this is true, the latter is very problematic because files should be the same.

I've figured it out. If you look a bit closer you can see this pattern: ever sidecar say "rename" and every image says "BIN +... Binary file not shown".

2020-08-05-124926_963x547_scrot

But actually are renames. The log I quoted above shows renames.

commit 7a2e9d85c56d8473e98a88ce752712faf9d635f9
Author: Alexandru Foias <alexandrufoias@gmail.com>
Date:   Fri Jul 31 15:10:09 2020 -0400

    rename sapienza to fslAchieva

 rename derivatives/labels/{sub-sapienza01/anat/sub-sapienza01_T1w_RPI_r_labels-manual.nii.gz => sub-fslAchieva01/anat/sub-fslAchieva01_T1w_RPI_r_labels-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza01/anat/sub-sapienza01_T2star_rms_gmseg-manual.nii.gz => sub-fslAchieva01/anat/sub-fslAchieva01_T2star_rms_gmseg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza01/anat/sub-sapienza01_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva01/anat/sub-fslAchieva01_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza01/dwi/sub-sapienza01_dwi_moco_dwi_mean_seg-manual.json => sub-fslAchieva01/dwi/sub-fslAchieva01_dwi_moco_dwi_mean_seg-manual.json} (100%)
 rename derivatives/labels/{sub-sapienza01/dwi/sub-sapienza01_dwi_moco_dwi_mean_seg-manual.nii.gz => sub-fslAchieva01/dwi/sub-fslAchieva01_dwi_moco_dwi_mean_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza03/anat/sub-sapienza03_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva03/anat/sub-fslAchieva03_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza04/anat/sub-sapienza04_T2star_rms_gmseg-manual.nii.gz => sub-fslAchieva04/anat/sub-fslAchieva04_T2star_rms_gmseg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza04/anat/sub-sapienza04_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva04/anat/sub-fslAchieva04_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza05/anat/sub-sapienza05_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva05/anat/sub-fslAchieva05_acq-T1w_MTS_seg-manual.nii.gz} (100%)
 rename derivatives/labels/{sub-sapienza06/anat/sub-sapienza06_acq-T1w_MTS_seg-manual.nii.gz => sub-fslAchieva06/anat/sub-fslAchieva06_acq-T1w_MTS_seg-manual.nii.gz} (100%)

It's just a glitch in the Github UI.

You can verify the contents haven't changed by running sha256 on the files in both branches. Here, this should work on macOS I think:

git checkout master
sha256 derivatives/labels/sub-sapienza04/anat/sub-sapienza04_acq-T1w_MTS_seg-manual.nii.gz
git checkout alexfoias-add_fsl_dataset
sha256 sub-fslAchieva04/anat/sub-fslAchieva04_acq-T1w_MTS_seg-manual.nii.gz 

or you could use this, which should be much quicker:

git show master:derivatives/labels/sub-sapienza04/anat/sub-sapienza04_acq-T1w_MTS_seg-manual.nii.gz | sha256
git show alexfoias-add_fsl_dataset:sub-fslAchieva04/anat/sub-fslAchieva04_acq-T1w_MTS_seg-manual.nii.gz | sha256
kousu commented 4 years ago

It might be helpful to know that git doesn't actually know about renames! Renames are not part of its data structure. Adds and deletions are. Every time it says "renamed" in the git log it's actually because it's detected an add/remove pair in the data structure with the same content on both sides.

So what Github is doing is not, technically, incorrect.

jcohenadad commented 4 years ago

It might be helpful to know that git doesn't actually know about renames! Renames are not part of its data structure. Adds and deletions are. Every time it says "renamed" in the git log it's actually because it's detected an add/remove pair in the data structure with the same content on both sides.

So what Github is doing is not, technically, incorrect.

interesting. So, does it mean that the repository grows with renaming files?

what's funny is that some renamed file show no changes: image

but again, that could be the glitch you mentioned above

or: is it possible that non-binary text files (like json) are inspected, and if no change is detected, then git does not make the repository "grow" for those files.

kousu commented 4 years ago

It might be helpful to know that git doesn't actually know about renames! Renames are not part of its data structure. Adds and deletions are. Every time it says "renamed" in the git log it's actually because it's detected an add/remove pair in the data structure with the same content on both sides. So what Github is doing is not, technically, incorrect.

interesting. So, does it mean that the repository grows with renaming files?

No, because the actual content is in ~/.git/objects and that stores content by hash -- the same way git-annex works, incidentally (gorey details here and here). So no, moving a file doesn't grow the repo.

what's funny is that some renamed file show no changes:

All the files that github shows as "renamed with no changes" are .json. All the ones it shows with a size are .nii.gz. I bet they're running diff to decide how to generate that section which reacts differently to "text" and "binary" files. Notice that the .nii.gz files have a little -> arrow with the two filenames on either side in the header, so Github is recognizing it as a rename, it's just using a different display template.

jcohenadad commented 4 years ago

@kousu is that OK if we merge this PR, or should we rather do it in https://github.com/kousu/spine-generic-data-multi-subject/tree/master-reannexed?

jcohenadad commented 4 years ago

@alexfoias the PR title says "WIP: …" --> is it still WIP or ready to be reviewed?

alexfoias commented 4 years ago

@jcohenadad the PR is ready for review. I fixed the renaming for the test sub-amu02.

jcohenadad commented 4 years ago

@jcohenadad the PR is ready for review. I fixed the renaming for the test sub-amu02.

~thx-- i've added this in the list of changes for this PR https://github.com/spine-generic/data-multi-subject/pull/21#issue-460426902~

^ scrap that. @alexfoias comment "I fixed the renaming for the test sub-amu02" was referring to the renaming test he has been conducting for the past commits, to check how file size is modified in github. So, this change is not part of this PR.