Reconversion of phase1 data from raw into bids

psychoinformatics-de / studyforrest-data

DataLad superdataset of all studyforrest.org project dataset components

https://studyforrest.org

9 stars 2 forks source link

Reconversion of phase1 data from raw into bids #29

Open mih opened 3 years ago

mih commented 3 years ago

There are two possible approach that I can see:

Make the code from 2013 run reproducibly

This should be doable, this was all standard Debian package and the custom code in the datasets. We could create a singularity image that travels back in time. This would be attractive from a perspective of forensic data management. And might get us to a place that matches the OpenNeuro #28 state, but with provenance.

Redo the conversion with modern day tooling

This comes with the danger that files come out differently. One would then need to figure out, how they differ, and maybe even why. Pro: this will give us much better metadata automatically; we can showcase hirni. Con: Lots of work, leading to possibly lots of more work.

Waddayathink @bpoldrack

Checklist for the BIDS dataset:

Version: 01fe519fb76c92dd323c4876b57554ce010928f0

[ ] valid repo: Everything properly committed. Clean status, nothing untracked.
[ ] completeness: Does it contain everything that it needs to? What is still missing?
data correctness and consistency:
- [ ] converted behavioral logs from pandora okay?
- [ ] ^ is this assessment included in tests?
- [ ] converted physio files okay?
- [ ] ^ is this assessment included in tests?
- [ ] converted images okay?
- [ ] ^ is this assessment included in tests?
[ ] BIDS validator approves
[ ] What do we break in comparison with last OpenNeuro release and are we okay with it?
Anonymization:
- [ ] all (run-)records in the history pointing into the raw datasets in sidecar files?
- [ ] setup to make sure, those sidecar files are not publicly available?
- [ ] everything defaced that should be?
- [ ] non-defaced images dropped and not available?

bpoldrack commented 3 years ago

I think it depends on what's most important. I can see the argument for the first approach. If highest priorities are a) matching the state in OpenNeuro and b) minimizing workload then this looks like a good idea.

However, I'd argue that there is an additional Pro for the second approach: We'd have a raw dataset + specifications, that we can much more easily apply any different conversions in the future on. Thinking of significant changes in BIDS or yet another standard we'd want to represent the data in. It may also be easier to get new metadata standards/formats etc. in the future.

So, not exactly sure. Need to have a closer look into the existing repo to see what we may loose that way.

One more thing: If we actually can have time traveling containers that can reproduce what was done back then, it seems to me that we can have a third approach: a merger of both. Nothing forces us to use hirni with the current toolbox.

However, if we go for 1) or 3), I'll need help figuring out what was done how and therefore how to build the container(s) and may be break things down into a few procedures. Inspecting that on my own sounds like it'll take too long.

mih commented 3 years ago

I'd say we go for (2) hirni in this case.

loj commented 3 years ago

@bpoldrack here is the BIDS spec for the diffusion data https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#diffusion-imaging-data

Susceptibility Weighted Imaging (SWI) is still a BEP (https://docs.google.com/document/d/1kyw9mGgacNqeMbp4xZet3RnDhcMmf4_BmRgKaOkO2Sc/edit)

bpoldrack commented 3 years ago

Referencing https://github.com/psychoinformatics-de/studyforrest-data/issues/34#issuecomment-827416454

bpoldrack commented 3 years ago

Note: Instead of phase1, now aim for publication related targets.

Comparison of conversion outcome is to be made against anondata.

bpoldrack commented 3 years ago

Starting to look into conversion issues.

Task labels (see https://github.com/psychoinformatics-de/studyforrest-data/issues/35#issuecomment-826894871): anondata uses numbered tasks, whereas an earlier attempt to redo the conversion used aomovie and pandora. OpenNeuro apparently uses its own names:

ds000113 on git:master
❱ ls task*
task-auditoryperception_bold.json  task-movielocalizer_bold.json    task-objectcategories_physio.json    task-retmapccw_physio.json  task-retmapcon_physio.json
task-coverage_bold.json            task-movielocalizer_physio.json  task-orientation_bold.json           task-retmapclw_bold.json    task-retmapexp_bold.json
task-coverage_rec-dico_bold.json   task-movie_physio.json           task-orientation_rec-dico_bold.json  task-retmapclw_physio.json  task-retmapexp_physio.json
task-movie_bold.json               task-objectcategories_bold.json  task-retmapccw_bold.json             task-retmapcon_bold.json

What labels do we settle for, @mih ?

Edit: Verdict: Take from OpenNeuro. Ergo: forrestgump for 7T_ad (+anatomy?) and auditoryperception for pandora.

adswa commented 3 years ago

/data/project/studyforrest_phase1/testing/scientific-data-2014-bids currently does not contain any functional nifits

adswa commented 3 years ago

Here is a potential structure for the events files of the pandora data. They should be derived from the behavdata.tsv files with a script. Most variables can be pulled out verbatim, just onset and duration need to be computed like this from the trial duration (6 seconds) plus the trial-specific delay

events.tsv

onset   duration    trial_type  run run_id  volume         run_volume   stim    genre   delay   catch   sound_soa   trigger_ts
0         6.0           <genre>        <run>  <run_id>   <volume>  <run_volume>         <stim> <genre> <delay> <catch>  <sound_soa> <trigger_ts>

The corresponding json file should look about like this: events.json

{
    "trial_type": {
        "LongName": "Event category",
        "Description": "Indicator of the genre of the musical stimulus",
        "Levels": {
            "country": "Country music",
            "symphonic": "Symphonic music",
            "metal": "metal music",
            "ambient": "ambient music",
            "rocknroll": "rocknroll music"
        }
    },
    "sound_soa": {
        "LongName": "Sound onset asynchrony",
        "Description": "asynchrony between MRI trigger and sound onset",
    },
    "catch": {
        "LongName": "Control question",
        "Description": "flag whether a control question with presented",
    },
    "volume": {
        "LongName": "fMRI volume total",
        "Description": "fMRI volume corresponding to stimulation start",
    },
    "run_volume": {
        "LongName": "fMRI volume run",
        "Description": "fMRI volume corresponding to stimulation start in the current run",
    },
    "run": {
        "LongName": "Run in Sequence",
        "Description": "order of run in sequence ",
    },
    "run_id": {
        "LongName": "Trial ID in Run",
        "Description": "ID of trial sequence for this run",
    },
    "stim": {
        "LongName": "Stimulation file",
        "Description": "stimulus file name",
    },
    "delay": {
        "LongName": "inter-stimulus interval",
        "Description": "inter-stimulus interval in seconds",
        "Units": "seconds"
    },
    "trigger_ts": {
        "LongName": "Trigger time stamp",
        "Description": "time stamp of the corresponding MRI trigger with respect to the start of the experiment in seconds ",
        "Units": "seconds"
    },
    "genre": {
        "LongName": "Genre",
        "Description": "Indicator of the genre of the musical stimulus",
        "Levels": {
            "country": "Country music",
            "symphonic": "Symphonic music",
            "metal": "metal music",
            "ambient": "ambient music",
            "crocknroll": "rocknroll music"
        }
    }
}

bpoldrack commented 3 years ago

For anatomy I have the following image series.

SeriesNumber, Protocol, currently assigned modality, whether currently converted or ignored for conversion:

(101, 'SmartBrain_32channel', None, 'ignored'), (102, 'SmartBrain_32channel', None, 'ignored'), (103, 'Patient Aligned MPR AWPLAN_SMARTPLAN_TYPE_BRAIN', None, 'ignored'), (201, 'B1_calibration_brain', None, 'ignored'), (202, 'B1_calibration_brain', None, 'ignored'), (301, 'Ref_Head_32', None, 'ignored'), (401, 'sT1W_3D_TFE_TR2300_TI900_0.7iso_FS', 't1w', 'converted'), (501, 'VEN_BOLD_HR_32chSHC', 'swi', 'converted'), (502, 'VVEN_BOLD_HR_32chSHC SENSE', 'swi', 'ignored'), (601, 'sT2W_3D_TSE_32chSHC_0.7iso', 't2w', 'converted'), (701, 'DTI_high_2iso', 'dwi', 'ignored'), (702, 'Reg - DTI_high_iso', 'dwi', 'converted'), (703, 'dReg - DTI_high_iso', 'dwi', 'ignored'), (704, 'eReg - DTI_high_iso', 'dwi', 'ignored'), (705, 'faReg - DTI_high_iso', 'dwi', 'ignored'), (706, 'facReg - DTI_high_iso', 'dwi', 'ignored'), (801, 'field map', 'fieldmap', 'ignored')

Questions: Something that is ignored, but should be converted? Which ones should be assigned modality veno and angio?

loj commented 3 years ago

Suggested content for the dataset_descrption.json:

{
    "Name": "scientific-data-2014-bids",
    "BIDSVersion": "TODO",
    "DatasetType": "raw",
    "License": "CC0",
    "Authors": [
        "Michael Hanke",
        "Florian J. Baumgartner",
        "Pierre Ibe",
        "Falko R. Kaule",
        "Stefan Pollmann",
        "Oliver Speck",
        "Wolf Zinke",
        "Jorg Stadler",
        "Richard Dinga",
        "Christian Häusler",
        "J. Swaroop Guntupalli",
        "Michael Casey"
    ],
    "Acknowledgements": "",
    "HowToAcknowledge": "Please follow good scientific practice by citing the most appropriate publication(s) describing the aspects of this datasets that were used in a study.",
    "Funding": [
        "A grant from the German Federal Ministry of Education and Research (BMBF) funded the initial data acquisition as part of the US-German collaboration in computational neuroscience (CRCNS) project: Development of general high-dimensional models of neuronal representation spaces (Haxby / Ramagde / Hanke), co-funded by the BMBF and the US National Science Foundation (BMBF 01GQ1112; NSF 1129855).",
        "We acknowledge the support of the Combinatorial NeuroImaging Core Facility at the Leibniz Institute for Neurobiology in Magdeburg.",
        "Moreover, development of data sharing technology used for dissemination and management of this dataset is supported by another US-German collaboration grant awarded to Halchenko and Hanke: DataLad: Converging catalogues, warehouses, and deployment logistics into a federated 'data distribution', also co-funded by BMBF (01GQ1411) and NSF (1129855).",
        "The German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences provided support for data acquisition hardware and personnel."
    ],
    "EthicsApprovals": [
        ""
    ],
    "ReferencesAndLinks": [
        "http://studyforrest.org",
        "https://www.nature.com/articles/sdata20143",
        "https://f1000research.com/articles/4-174/v1"
    ],
    "DatasetDOI": "TODO"
}

TODOs:

[ ] possibly change name of the dataset to "studyforrest phase1"
[ ] add BIDS version (1.6.0 probably)
[ ] add DOI

Note: This is different from the OpenNeuro dataset_description.json, but that is intended since this dataset includes only data from "phase1".

bpoldrack commented 3 years ago

Current image series for 7T_ad:

(1, 'AAHScout_32ch', None, 'ignored'), (2, 'AAHScout_32ch_MPR', None, 'ignored'), (3, 'b1map_658', None, 'ignored'), (4, 'b1map_658', None, 'ignored'), (5, 'CV_shim_452B', None, 'ignored'), (6, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl', None, 'ignored'), (7, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_PostProc', None, 'ignored'), (8, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_DiCo', None, 'ignored'), (9, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R1', 'bold', 'ignored'), (10, 'MoCoSeries_DiCo', 'bold', 'converted'), (11, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R2', 'bold', 'ignored'), (12, 'MoCoSeries_DiCo', 'bold', 'converted'), (13, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R3', 'bold', 'ignored'), (14, 'MoCoSeries_DiCo', 'bold', 'converted'), (15, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R4', 'bold', 'ignored'), (16, 'MoCoSeries_DiCo', 'bold', 'converted'), (99, 'PhoenixZIPReport', None, 'ignored')

(2, 'AAHScout_32ch_MPR', None, 'ignored'), (1, 'AAHScout_32ch', None, 'ignored'), (3, 'CV_shim_452B', None, 'ignored'), (4, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl', None, 'ignored'), (5, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_PostProc', None, 'ignored'), (7, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R5', 'bold', 'ignored'), (8, 'MoCoSeries_DiCo', 'bold', 'converted'), (9, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R6', 'bold', 'ignored'), (10, 'MoCoSeries_DiCo', 'bold', 'converted'), (11, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R7', 'bold', 'ignored'), (12, 'MoCoSeries_DiCo', 'bold', 'converted'), (13, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R8', 'bold', 'ignored'), (14, 'MoCoSeries_DiCo', 'bold', 'converted'), (99, 'PhoenixZIPReport', None, 'ignored'), (6, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_DiCo', None, 'ignored')

bpoldrack commented 3 years ago

And pandora, @mih :

(2, 'AAHScout_32ch_MPR', None, 'ignored'), (1, 'AAHScout_32ch', None, 'ignored'), (3, 'b1map_658', None, 'ignored'), (4, 'b1map_658', None, 'ignored'), (5, 'CV_shim_452B', None, 'ignored'), (6, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl', None, 'ignored'), (7, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_PostProc', None, 'ignored'), (9, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R1P', 'bold', 'ignored'), (10, 'MoCoSeries_DiCo', 'bold', 'converted'), (11, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R2P', 'bold', 'ignored'), (12, 'MoCoSeries_DiCo', 'bold', 'converted'), (13, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R3P', 'bold', 'ignored'), (14, 'MoCoSeries_DiCo', 'bold', 'converted'), (15, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R4P', 'bold', 'ignored'), (16, 'MoCoSeries_DiCo', 'bold', 'converted'), (17, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R5P', 'bold', 'ignored'), (18, 'MoCoSeries_DiCo', 'bold', 'converted'), (19, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R6P', 'bold', 'ignored'), (20, 'MoCoSeries_DiCo', 'bold', 'converted'), (21, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R7P', 'bold', 'ignored'), (22, 'MoCoSeries_DiCo', 'bold', 'converted'), (23, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R8P', 'bold', 'ignored'), (24, 'MoCoSeries_DiCo', 'bold', 'converted'), (25, 'ToF-3D-multi-slab-0.3iso_FA20_claus', 'angio', 'converted'), (99, 'PhoenixZIPReport', None, 'ignored'), (8, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_DiCo', None, 'ignored')

adswa commented 3 years ago

I found a trigger mismatch in the converted Physio files from the audiomovie in your test repo (/data/project/studyforrest_phase1/testing/scientific-data-2014-bids/sub-002/func), @bpoldrack.

The test checks whether it finds the same number as triggers as the run had TRs:

check_physio()
{
    nvols=$1
    shift
    for f in $@; do
        found_trigger="$(zgrep '^1' "$f" | wc -l)"
        assertEquals "Need to find each trigger in the log" "$nvols" "$found_trigger"
    done
}

test_physio_movie_runs()
{
  count=1
  for nvols in 451 441 438 488 462 439 542 338; do
      check_physio $nvols *_task-forrestgump_run-0${count}_physio.tsv.gz
      count=$(( $count + 1 ))
  done
}

It fails for a few subjects in the conversion.

subject 1

ASSERT:Need to find each trigger in the log expected:<451> but was:<434>
ASSERT:Need to find each trigger in the log expected:<441> but was:<424>
ASSERT:Need to find each trigger in the log expected:<438> but was:<421>
ASSERT:Need to find each trigger in the log expected:<488> but was:<466>
ASSERT:Need to find each trigger in the log expected:<462> but was:<440>
ASSERT:Need to find each trigger in the log expected:<439> but was:<422>
ASSERT:Need to find each trigger in the log expected:<542> but was:<520>
ASSERT:Need to find each trigger in the log expected:<338> but was:<321>

subject 2:

ASSERT:Need to find each trigger in the log expected:<451> but was:<433>
ASSERT:Need to find each trigger in the log expected:<441> but was:<424>
ASSERT:Need to find each trigger in the log expected:<438> but was:<421>
ASSERT:Need to find each trigger in the log expected:<488> but was:<466>
ASSERT:Need to find each trigger in the log expected:<462> but was:<440>
ASSERT:Need to find each trigger in the log expected:<439> but was:<423>
ASSERT:Need to find each trigger in the log expected:<542> but was:<520>
ASSERT:Need to find each trigger in the log expected:<338> but was:<322>

subject 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 are good!

subject 18: (only fails with one trigger)

ASSERT:Need to find each trigger in the log expected:<451> but was:<435>

subject 19 and 20 are good.

It does not fail for the OpenNeuro dataset.

loj commented 3 years ago

Reminder to add the scanner acquisition protocols to sourcedata/acquisition_protocols in the converted BIDS dataset.

adswa commented 3 years ago

Doing the same assertion for the pandora data yields mismatches: subject 1, subject 2

ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>

subject 18

ASSERT:Need to find each trigger in the log expected:<153> but was:<146>

loj commented 3 years ago

@bpoldrack

Here is the BIDS convention for the <index> value

<index> - a nonnegative integer, possibly prefixed with arbitrary number of 0s for consistent indentation, for example, it is 01 in run-01 following run- specification.

bpoldrack commented 3 years ago

Re physio trigger mismatches:

Noting observations for now.

At least for audiomovie, the failing subjects are exactly the ones with sampling frequency 100, while the passing ones have 200. Didn't check pandora yet.

adswa commented 3 years ago

Scripts for converting the log files into events.tsv and events.json files are in data/project/studyforrest/gumpdata/scripts/conversion as reconvert_behavlog_pandora and reconvert_behavtrials_pandora. They are used like this:

# for the trial file
./reconvert_behavtrials_pandora \
    /data/project/studyforrest/pandora/logs/xy22.trials \
   'sub-01_task-avmovie_run-0_events'
# for the log files
 ./reconvert_behavlog_pandora \
    /data/project/studyforrest/pandora/logs/ap75.log \
    'sub-01_task-avmovie_run-0_events'

i.e. <script> <log/trial file> <outpath with placeholder for run>

loj commented 3 years ago

I have a README put together, but it requires updating the file paths. Once we have a sample single subject converted dataset, I can start updating the file paths.

bpoldrack commented 3 years ago

Update from matrix channel:

First of all, it ran through, yeah! I have not yet looked closely into it. Initial glimpse, says there are (somewhat minor issues):

the behavioral events files of 3 subjects are untracked. Presumably something not quite right with the run call yet.
something is unlocking but not saving again the toplevel task-forrestgump-*.json
the .wavs and swaroop ended up toplevel instead of underneath stimuli.

Those are all "technical" that should be relatively easy to fix and apply those fixes. Everything else I have not yet assessed. So, please poke the actual data, everybody!

Here it is: /data/project/studyforrest_phase1/phase1-bids.

ping @mih

mih commented 3 years ago

It would be good, if this issue gets a checklist to capture what has been looked at and for which conversion attempt. Otherwise it will be rather hard to come to an end.

bpoldrack commented 3 years ago

@mih

It would be good, if this issue gets a checklist to capture what has been looked at and for which conversion attempt. Otherwise it will be rather hard to come to an end.

Edited first post.

loj commented 3 years ago

Here are the issues I've gathered so far for the recent conversion.

re Missing data types:

These are files that were described by the old README that I haven't managed to find in the newly converted dataset. Some of them are likely elsewhere and don't belong in this dataset, but I wanted to list them just in case.

Raw data:
- fieldmaps
- moco
- angio
- acquisition protocols (https://github.com/psychoinformatics-de/studyforrest-data/issues/29#issuecomment-829230280)
- demographics data: demographics.csv file with "participants' responses to a questionnaire on demographic information, musical preference and background, as well as familiarity with the "Forrest Gump" movie"
- audio-description transcript: german_audio_description.csv
- movie scenes: scenes.csv with start and end time for all 198 scenes in the presented movie cut and if the scene takes place indoors or outdoors.
- Derivative data:
- linear anatomical alignment
- non-linear anatomical alignment
- aggregate BOLD functional MRI for brain atlas parcellations
- subject template volumes
- group template volumes

re BIDS compliance:

the subject level event files for the auditoryperception session/task should be one level down underneath the func dir
swi images:
- I think these should go under a separate swi directory.
- We should distinguish which are phase vs mag images. OpenNeuro did this with acq-pha and acq-mag.
- swi/* should be added to the .bidsignore file
the top level bold json files (task-*_acq-*_bold.json) have a few leftover TODOs:
- "CogAtlasID": "TODO",
- "TaskName": "TODO: full task name for auditoryperception",
the top level *.wav files and swaroop should go into a stimuli/ directory (you already are aware of this)
a studyspec.json file is ending up in the top level BIDS dataset
the validator doesn't like the dash used in the recording label for the card/resp files
- *_recording-cardresp-100_physio.json -> *_recording-cardresp100_physio.json
- *_recording-cardresp-200_physio.json -> *_recording-cardresp200_physio.json
anatomical defacemask images need a mod label
- sub-*_ses-forrestgump_run-*_T1w_defacemask.nii.gz -> sub-*_ses-forrestgump_run-*_mod-T1w_defacemask.nii.gz
- sub-*_ses-forrestgump_run-*_T2w_defacemask.nii.gz -> sub-*_ses-forrestgump_run-*_mod-T2w_defacemask.nii.gz
dwi defacemask images need a mod label and should be added to the .bidsignore file
- sub-*_ses-forrestgump_run-*_dwi_defacemask.nii.gz -> sub-*_ses-forrestgump_run-*_mod-dwi_defacemask.nii.gz
- sub-*/*/*/*mod-dwi_defacemask* should then be added to the .bidsignore file

bpoldrack commented 3 years ago

Thanks, @loj !

fieldmaps

We decided to not include them. Still the case, @mih?

moco

== dico

angio

Thx, need to investigate. Not intended.

acquisition protocols (https://github.com/psychoinformatics-de/studyforrest-data/issues/29#issuecomment-829230280)

True. Simply forgot to convert the toplevel specs of the raw datasets ;-) Added.

demographics data: demographics.csv file with "participants' responses to a questionnaire on demographic information, musical preference and background, as well as familiarity with the "Forrest Gump" movie"

audio-description transcript: german_audio_description.csv

movie scenes: scenes.csv with start and end time for all 198 scenes in the presented movie cut and if the scene takes place indoors or outdoors.

@mih : Does that stuff exist in any other place and/or shape other than anondata?

Derivative data:

linear anatomical alignment

non-linear anatomical alignment

aggregate BOLD functional MRI for brain atlas parcellations

subject template volumes

group template volumes

Same here, @mih - no idea, where that even comes from. So, considering the point above in addition: I guess I'll go with the original idea of an intermediate raw dataset, that includes all that stuff? Or does every part of that unambiguously belong inside pandora, 7T_ad, anatomy? If so, what goes where?

the subject level event files for the auditoryperception session/task should be one level down underneath the func dir

Yes, there's something wrong with the created name.

swi images:

I think these should go under a separate swi directory.

We should distinguish which are phase vs mag images. OpenNeuro did this with acq-pha and acq-mag.

swi/* should be added to the .bidsignore file

We decided to go with veno and acq-pha/acq-mag, since swi is still a BEP and we don't want to break names for users to change to something that isn't settled yet.

the top level bold json files (task-*_acq-*_bold.json) have a few leftover TODOs:

"CogAtlasID": "TODO",

"TaskName": "TODO: full task name for auditoryperception",

Yes, but I have no clue. TaskDescription should probably also be included. "Someone" needs to provide the truth here!

a studyspec.json file is ending up in the top level BIDS dataset

Good catch. That's a hint, that I seem to have screwed with the versions of the raw dataset. That was an (already fixed) bug - explains the toplevel stimuli, too.

the validator doesn't like the dash used in the recording label for the card/resp files

*_recording-cardresp-100_physio.json -> *_recording-cardresp100_physio.json

*_recording-cardresp-200_physio.json -> *_recording-cardresp200_physio.json

Will do.

anatomical defacemask images need a mod label

sub-*_ses-forrestgump_run-*_T1w_defacemask.nii.gz -> sub-*_ses-forrestgump_run-*_mod-T1w_defacemask.nii.gz

sub-*_ses-forrestgump_run-*_T2w_defacemask.nii.gz -> sub-*_ses-forrestgump_run-*_mod-T2w_defacemask.nii.gz

dwi defacemask images need a mod label and should be added to the .bidsignore file

sub-*_ses-forrestgump_run-*_dwi_defacemask.nii.gz -> sub-*_ses-forrestgump_run-*_mod-dwi_defacemask.nii.gz

sub-*/*/*/*mod-dwi_defacemask* should then be added to the .bidsignore file

Ah - need to fix the deface procedure then.

mih commented 3 years ago

Thanks, @loj !

fieldmaps

We decided to not include them. Still the case, @mih?

Hard to say, this issue lumps together so many aspects. If these are the fieldmaps that were acquired together with the DWI data at 3T, then yes. They are invalid.

moco

== dico

moco = Motion corrected; dico = distortion corrected

So depending which specific data we are talking about that statement is true (moco is precondition for dico), or not (dico is optional).

angio

Thx, need to investigate. Not intended.

As in "not intended to be there for now"?

demographics data: demographics.csv file with "participants' responses to a questionnaire on demographic information, musical preference and background, as well as familiarity with the "Forrest Gump" movie"

audio-description transcript: german_audio_description.csv

movie scenes: scenes.csv with start and end time for all 198 scenes in the presented movie cut and if the scene takes place indoors or outdoors.

@mih : Does that stuff exist in any other place and/or shape other than anondata?

demographics.csv is an original file that contains data only available on paper.

The two other CSVs are outdated and these old versions are here: https://github.com/psychoinformatics-de/studyforrest-data-annotations/blob/master/old/structure/scenes.csv https://github.com/psychoinformatics-de/studyforrest-data-annotations/blob/master/old/speech/german_audio_description.csv

Derivative data:

linear anatomical alignment

non-linear anatomical alignment

aggregate BOLD functional MRI for brain atlas parcellations

subject template volumes

group template volumes

Same here, @mih - no idea, where that even comes from. So, considering the point above in addition: I guess I'll go with the original idea of an intermediate raw dataset, that includes all that stuff? Or does every part of that unambiguously belong inside pandora, 7T_ad, anatomy? If so, what goes where?

This is described in the data paper under technical validation. These can all be ignore for the raw dataset, because they are the outcome of a computational pipeline (i.e. derivatives). Everything labeled "alignment/volumes" in the list above is in https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms . The aggregate timeseries are in https://github.com/psychoinformatics-de/studyforrest-data-aggregate From my POV, they can stay there (hosted on GIN).

swi images:

I think these should go under a separate swi directory.

We should distinguish which are phase vs mag images. OpenNeuro did this with acq-pha and acq-mag.

swi/* should be added to the .bidsignore file

We decided to go with veno and acq-pha/acq-mag, since swi is still a BEP and we don't want to break names for users to change to something that isn't settled yet.

Either way is fine with me: One is an established non-standard, the other is the anticipation of a standard.

the top level bold json files (task-*_acq-*_bold.json) have a few leftover TODOs:

"CogAtlasID": "TODO",

"TaskName": "TODO: full task name for auditoryperception",

Yes, but I have no clue. TaskDescription should probably also be included. "Someone" needs to provide the truth here!

As mentioned elsewhere, the task descriptions are in /data/project/studyforrest/anondata/task_key.txt This is how openfmri used to do it. Complete information on how to look up files is available at openfmri.org https://legacy.openfmri.org/data-organization-old/

I don't think anyone has looked up the IDs for these tasks on http://www.cognitiveatlas.org/ yet.

the validator doesn't like the dash used in the recording label for the card/resp files

*_recording-cardresp-100_physio.json -> *_recording-cardresp100_physio.json

*_recording-cardresp-200_physio.json -> *_recording-cardresp200_physio.json

Will do.

Thx, looks OK to me.