nilearn / nistats

Modeling and statistical inference on fMRI data in Python
BSD 3-Clause "New" or "Revised" License
95 stars 55 forks source link

added a fetcher to download language localizer demo dataset adapted to BIDS 1.2 spec #341

Closed chrplr closed 5 years ago

chrplr commented 5 years ago

I uploaded a new version of the language localizer dataset (10 participants, preprocessed fmri scans only) on OSF. I corrected the .json files, fixed the headers of the nii.gz files to have the correct TR (1.5) and unit (seconds and mm).

The scripts in examples still need to be adapted to use fetch_langloc_dataset in lieu of fetch_bids_langloc_dataset

kchawla-pi commented 5 years ago

@chrplr Pull from master for the tests to pass.

kchawla-pi commented 5 years ago

Hey @chrplr , will you be able to wrap this up soon?

thompsonj commented 5 years ago

I've been debugging some issues with the PR because it is related to PR #373 that I recently submitted.

I think this PR would need to be merged first before #373 could be merged.

kchawla-pi commented 5 years ago

Hey @chrplr , think you can attend to this soon? Doesn't seem like there is much to do to complete it.

kchawla-pi commented 5 years ago

I hear you are unavailable this month, @chrplr . I would rather neither keep #373 waiting for this, nor ask you to work on this in your vacation time. I'll work on this next week. It seems simple enough to change but if I get stuck, I might look to you for guidance @thompsonj .

kchawla-pi commented 5 years ago

@bthirion

thompsonj commented 5 years ago

@kchawla-pi one idea in case you encounter any issues: if you omit the changes to examples/05_complete_examples/plot_bids_analysis.py from this PR we could offload the work of updating the example to #373. Then the main contribution of this PR is the new loading function for the updated langloc dataset, which wouldn't actually be used until #373 is ready.

kchawla-pi commented 5 years ago

@thompsonj That's excellent & seems a far cleaner way of attending to all this. Please go ahead with what you have outlined. Thank you!

thompsonj commented 5 years ago

@kchawla-pi just to clarify my last message, I was still suggesting that this PR should be merged first since we will need the new dataset fetcher introduced here to finish updating the examples. My proposal was to omit commit 51a15bbe1d7ad396ff4aa4b205cc2b29e1a9e4f2 from this PR (commit message "essai") which I anticipate should address the CI failure (or at least make it easier to fix).

kchawla-pi commented 5 years ago

Thanks @thompsonj ! I have looked at the failure and it is due to a change in matplotlib. This has been fixed in Nistats master. @chrplr Another pull from upstream master is required. The last pull in this PR is on May 30, while the issue was fixed on Jun 12.

chrplr commented 5 years ago

I look into it as soon as I have a decent internet connection. Sorry for he huge delay.

On Wed, Aug 21, 2019, 18:55 Kshitij Chawla notifications@github.com wrote:

Thanks @thompsonj https://github.com/thompsonj ! I have looked at the failure and it is due to a change in matplotlib. This has been fixed in Nistats master. @chrplr https://github.com/chrplr Another pull from upstream master is required. The last pull in this PR is on May 30, while the issue was fixed on Jun 12.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nistats/nistats/pull/341?email_source=notifications&email_token=AALVWMXVV6FFHI4NFFYDLDLQFVXPDA5CNFSM4HGL5GS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD42LWXA#issuecomment-523549532, or mute the thread https://github.com/notifications/unsubscribe-auth/AALVWMQL6PCCUASE7S6WLDLQFVXPDANCNFSM4HGL5GSQ .

codecov[bot] commented 5 years ago

Codecov Report

Merging #341 into master will decrease coverage by 0.28%. The diff coverage is 23.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #341      +/-   ##
==========================================
- Coverage   91.16%   90.87%   -0.29%     
==========================================
  Files          34       34              
  Lines        3734     3748      +14     
==========================================
+ Hits         3404     3406       +2     
- Misses        330      342      +12
Impacted Files Coverage Δ
nistats/utils.py 92.61% <0%> (ø) :arrow_up:
nistats/first_level_model.py 82.99% <75%> (-0.23%) :arrow_down:
nistats/datasets.py 66.1% <8.33%> (-2.45%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8f0b606...3b0f265. Read the comment docs.

chrplr commented 5 years ago

Hello,

I am sorry but I need a bit of guidance here. What do I have do exactly to unblock the situation?

Is it enough that I make the dataset BIDS 1.2.1 compliant and correct the datafetcher accordingly?

Christophe

On Sat, Aug 24, 2019 at 3:13 PM codecov[bot] notifications@github.com wrote:

Codecov https://codecov.io/gh/nistats/nistats/pull/341?src=pr&el=h1 Report

Merging #341 https://codecov.io/gh/nistats/nistats/pull/341?src=pr&el=desc into master https://codecov.io/gh/nistats/nistats/commit/8f0b606f6da6dc7f55e25cc0fa903fdfcc007145?src=pr&el=desc will decrease coverage by 0.32%. The diff coverage is 23.52%.

[image: Impacted file tree graph] https://codecov.io/gh/nistats/nistats/pull/341?src=pr&el=tree

@@ Coverage Diff @@

master #341 +/-

==========================================

  • Coverage 91.16% 90.83% -0.33%

==========================================

Files 34 34

Lines 3734 3733 -1

==========================================

  • Hits 3404 3391 -13
  • Misses 330 342 +12

Impacted Files https://codecov.io/gh/nistats/nistats/pull/341?src=pr&el=tree Coverage Δ nistats/utils.py https://codecov.io/gh/nistats/nistats/pull/341/diff?src=pr&el=tree#diff-bmlzdGF0cy91dGlscy5weQ== 92.61% <0%> (ø) ⬆️ nistats/first_level_model.py https://codecov.io/gh/nistats/nistats/pull/341/diff?src=pr&el=tree#diff-bmlzdGF0cy9maXJzdF9sZXZlbF9tb2RlbC5weQ== 82.99% <75%> (-0.23%) ⬇️ nistats/datasets.py https://codecov.io/gh/nistats/nistats/pull/341/diff?src=pr&el=tree#diff-bmlzdGF0cy9kYXRhc2V0cy5weQ== 66.1% <8.33%> (-2.45%) ⬇️ nistats/contrasts.py https://codecov.io/gh/nistats/nistats/pull/341/diff?src=pr&el=tree#diff-bmlzdGF0cy9jb250cmFzdHMucHk= 92.56% <0%> (-0.13%) ⬇️ nistats/tests/test_contrasts.py https://codecov.io/gh/nistats/nistats/pull/341/diff?src=pr&el=tree#diff-bmlzdGF0cy90ZXN0cy90ZXN0X2NvbnRyYXN0cy5weQ== 100% <0%> (ø) ⬆️ nistats/design_matrix.py https://codecov.io/gh/nistats/nistats/pull/341/diff?src=pr&el=tree#diff-bmlzdGF0cy9kZXNpZ25fbWF0cml4LnB5 88.57% <0%> (+0.08%) ⬆️

Continue to review full report at Codecov https://codecov.io/gh/nistats/nistats/pull/341?src=pr&el=continue.

Legend - Click here to learn more https://docs.codecov.io/docs/codecov-delta Δ = absolute (impact), ø = not affected, ? = missing data Powered by Codecov https://codecov.io/gh/nistats/nistats/pull/341?src=pr&el=footer. Last update 8f0b606...55f6444 https://codecov.io/gh/nistats/nistats/pull/341?src=pr&el=lastupdated. Read the comment docs https://docs.codecov.io/docs/pull-request-comments .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nistats/nistats/pull/341?email_source=notifications&email_token=AALVWMRANQENRNHHAGWOVQTQGEXWLA5CNFSM4HGL5GS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5B7Y7A#issuecomment-524549244, or mute the thread https://github.com/notifications/unsubscribe-auth/AALVWMTV55T6KLAECB5TVCTQGEXWLANCNFSM4HGL5GSQ .

--

Christophe Pallier christophe@pallier.org INSERM-CEA Cognitive Neuroimaging Lab, Neurospin, bat 145, 91191 Gif-sur-Yvette Cedex, France Tel: 00 33 1 69 08 79 34 Personal web site: http://www.pallier.org Lab web site: http://www.unicog.org https://www.chronicle.com/interactives/is-email-making-professors-stupid?

kchawla-pi commented 5 years ago

I'm not an expert at BIDS but based on the error message, I think you're correct. Making the data BIDS compliant should be sufficient.

chrplr commented 5 years ago

I have just reverted plot_bids_analysis.py to the old version. This PR should have only added fetch_langloc_dataset() to datasets.py

I hope that I am no longer blocking the merge of PR #373 (https://github.com/nistats/nistats/pull/373). Sorry about the long delay!

When PR#373 is merged, I will check the issues raised by @thompsonj (folder 'derivative', name indicating preprocessed files), and modifiy the saved dataset on osf accordingly, even if it currently passes the BIDS validator.

chrplr commented 5 years ago
* The updated version of this language localizer dataset doesn't put the preprocessed data in a derivatives folder, which causes an error when trying to construct the first level model in examples/example_05/plot_bids_analysis.py (currently failing in CircleCI). To adhere to BIDS, preprocessed files should be in a derivatives folder and the filename of a preprocessed data file should distinguish it from it's corresponding unprocessed data file, e.g. by adding  '_space-' and/or or 'desc_preproc'.

Trying to amend the dataset, I renamed the files, adding the 'space' and 'desc' descriptors:


$ ls sub-01/func 
sub-01_task-languagelocalizer_desc-confounds_regressors.tsv
sub-01_task-languagelocalizer_events.tsv
sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.json
sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz

Yet, the online Bids validator does not accept the new filenames :-(


Type:       Error
File:       sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.json
Location:       fMRI-language-localizer-dataset/sub-01/func/sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.json
Reason:     Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a ".bidsignore" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder.
Evidence:   sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.json

Type:       Error
File:       sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz
Location:       fMRI-language-localizer-dataset/sub-01/func/sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz
Reason:     Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a ".bidsignore" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder.
Evidence:   sub-01_task-languagelocalizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz

There must be a silly problem in the filenames that I fail to see. Any idea?

Secondly, I moved the sub folders into a new subfolder 'derivatives'. Then the bids validator complains that "There are no subject folders (labeled "sub-") in the root of this dataset."

thompsonj commented 5 years ago

Sorry for the confusion. I'm don't think the BIDS validator is able to check for the validity of a BIDS-Derivatives dataset? There is an open issue from 2018 suggesting that it does not and an answer to a similar question on NeuroStars saying that bids-validator ignores the 'derivatives' folder. I'm guessing that the BIDS validator is expecting a BIDS-Raw Dataset which is why it doesn't complain when you name the files/folders to look like raw files.

The latest Derivatives specification (not yet merged) describes two ways of storing a BIDS-Derivatives dataset: 1) as a subdirectory within a BIDS-Raw dataset (this is also what BIDS 1.2 says about derivatives) or 2) as a stand alone dataset independent of the raw data. Since 2) applies to you, it should be valid to not put your data in a derivatives folder. However, if you want the dataset to pass the bids-validator, I would use the convention of putting a derivatives folder within the BIDS-Raw dataset. In your case, I believe the only files to be in the raw dataset would be the event files (unless you choose to share other files that weren't a product of some processing).

I assume the requirement to not name processed files as raw files applies to both methods described above (bids-specification/src/05-derivatives/02-common-data-types.md)

The space keyword is recommended to distinguish files with different underlying coordinate systems or registered to different reference maps. The desc keyword is a general purpose field with freeform values. To distinguish between multiple different versions of processing for the same input data the desc keyword should be used.

Note that even though space and desc are optional at least one of them needs to be defined to avoid name conflict with the raw file.

See here for a discussion about the requirement to not name derivatives with valid names for raw files.

chrplr commented 5 years ago

Many thanks for your detailed explanation.

I learned the hard way that the BIDS validator does indeed not validate the derivatives. I wonder if there is any way to put data only derivatives on openneuro.org (?).

I had tried the exact file organization that you propose but it was still failing.

I will repackage the dataset following this convention and correct the fetcher to avoid the double directory issue. The only important thing is that nistats is able to import the data.

I probably won't have a good internet connection to upload the dataset to osf before the end of the week.

Thanks again for your help. Your messages are very clear.

On Mon, Aug 26, 2019, 19:46 Jessica Thompson notifications@github.com wrote:

Sorry for the confusion. I'm don't think the BIDS validator is able to check for the validity of a BIDS-Derivatives dataset? There is an open issue from 2018 https://github.com/bids-standard/bids-validator/issues/637 suggesting that it does not and an answer to a similar question on NeuroStars https://neurostars.org/t/sharing-bids-electrophysiological-derivatives-on-openneuro/4489/6 saying that bids-validator ignores the 'derivatives' folder. I'm guessing that the BIDS validator is expecting a BIDS-Raw Dataset which is why it doesn't complain when you name the files/folders to look like raw files.

The latest Derivatives specification https://github.com/bids-standard/bids-specification/blob/e4a4d6571aa5b8cd5790af508801ba73abfef9d5/src/05-derivatives/01-introduction.md#storage-of-derived-datasets (not yet merged) describes two ways of storing a BIDS-Derivatives dataset: 1) as a subdirectory within a BIDS-Raw dataset (this is also what BIDS 1.2 says about derivatives) or 2) as a stand alone dataset independent of the raw data. Since 2) applies to you, it should be valid to not put your data in a derivatives folder. However, if you want the dataset to pass the bids-validator, I would use the convention of putting a derivatives folder within the BIDS-Raw dataset. In your case, I believe the only files to be in the raw dataset would be the event files (unless you choose to share other files that weren't a product of some processing).

  • fMRI-language-localizer-dataset
    • dataset_description.json
    • derivatives
      • sub-01
        • func
          • sub-01_task_languagelocalizer_desc-preproc_bold.nii.gz
          • sub-01_task_languagelocalizer_desc-preproc_bold.json
          • sub-01_task_languagelocalizer_desc-confounds_regressors.tsv

        • sub-02 etc ...
    • sub-01
      • func
        • sub-01-task-languagelocalizer-events.tsv
      • sub-02 etc ...

I assume the requirement to not name processed files as raw files applies to both methods described above ( bids-specification/src/05-derivatives/02-common-data-types.md https://github.com/bids-standard/bids-specification/blob/5b037cdd3835fc33a6b5d0aa213b54430676e251/src/05-derivatives/02-common-data-types.md )

The space keyword is recommended to distinguish files with different underlying coordinate systems or registered to different reference maps. The desc keyword is a general purpose field with freeform values. To distinguish between multiple different versions of processing for the same input data the desc keyword should be used.

Note that even though space and desc are optional at least one of them needs to be defined to avoid name conflict with the raw file.

See here https://github.com/bids-standard/bids-specification/pull/265#discussion_r302807114 for a discussion about the requirement to not name derivatives with valid names for raw files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nistats/nistats/pull/341?email_source=notifications&email_token=AALVWMU64OT2W54V6MFDFJDQGQJHVA5CNFSM4HGL5GS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5FDOYQ#issuecomment-524957538, or mute the thread https://github.com/notifications/unsubscribe-auth/AALVWMVJNN6NH6H6HGX2NVDQGQJHVANCNFSM4HGL5GSQ .

chrplr commented 5 years ago

Replaced by PR #382