Open mih opened 3 years ago
I think it depends on what's most important. I can see the argument for the first approach. If highest priorities are a) matching the state in OpenNeuro and b) minimizing workload then this looks like a good idea.
However, I'd argue that there is an additional Pro for the second approach: We'd have a raw dataset + specifications, that we can much more easily apply any different conversions in the future on. Thinking of significant changes in BIDS or yet another standard we'd want to represent the data in. It may also be easier to get new metadata standards/formats etc. in the future.
So, not exactly sure. Need to have a closer look into the existing repo to see what we may loose that way.
One more thing: If we actually can have time traveling containers that can reproduce what was done back then, it seems to me that we can have a third approach: a merger of both. Nothing forces us to use hirni with the current toolbox.
However, if we go for 1) or 3), I'll need help figuring out what was done how and therefore how to build the container(s) and may be break things down into a few procedures. Inspecting that on my own sounds like it'll take too long.
I'd say we go for (2) hirni in this case.
@bpoldrack here is the BIDS spec for the diffusion data https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#diffusion-imaging-data
Susceptibility Weighted Imaging (SWI) is still a BEP (https://docs.google.com/document/d/1kyw9mGgacNqeMbp4xZet3RnDhcMmf4_BmRgKaOkO2Sc/edit)
Note: Instead of phase1
, now aim for publication related targets.
Comparison of conversion outcome is to be made against anondata
.
Starting to look into conversion issues.
Task labels (see https://github.com/psychoinformatics-de/studyforrest-data/issues/35#issuecomment-826894871): anondata
uses numbered tasks, whereas an earlier attempt to redo the conversion used aomovie
and pandora
. OpenNeuro apparently uses its own names:
ds000113 on git:master
❱ ls task*
task-auditoryperception_bold.json task-movielocalizer_bold.json task-objectcategories_physio.json task-retmapccw_physio.json task-retmapcon_physio.json
task-coverage_bold.json task-movielocalizer_physio.json task-orientation_bold.json task-retmapclw_bold.json task-retmapexp_bold.json
task-coverage_rec-dico_bold.json task-movie_physio.json task-orientation_rec-dico_bold.json task-retmapclw_physio.json task-retmapexp_physio.json
task-movie_bold.json task-objectcategories_bold.json task-retmapccw_bold.json task-retmapcon_bold.json
What labels do we settle for, @mih ?
Edit:
Verdict: Take from OpenNeuro. Ergo: forrestgump
for 7T_ad
(+anatomy
?) and auditoryperception
for pandora
.
/data/project/studyforrest_phase1/testing/scientific-data-2014-bids
currently does not contain any functional nifits
Here is a potential structure for the events files of the pandora data. They should be derived from the behavdata.tsv
files with a script. Most variables can be pulled out verbatim, just onset and duration need to be computed like this from the trial duration (6 seconds) plus the trial-specific delay
events.tsv
onset duration trial_type run run_id volume run_volume stim genre delay catch sound_soa trigger_ts
0 6.0 <genre> <run> <run_id> <volume> <run_volume> <stim> <genre> <delay> <catch> <sound_soa> <trigger_ts>
The corresponding json file should look about like this:
events.json
{
"trial_type": {
"LongName": "Event category",
"Description": "Indicator of the genre of the musical stimulus",
"Levels": {
"country": "Country music",
"symphonic": "Symphonic music",
"metal": "metal music",
"ambient": "ambient music",
"rocknroll": "rocknroll music"
}
},
"sound_soa": {
"LongName": "Sound onset asynchrony",
"Description": "asynchrony between MRI trigger and sound onset",
},
"catch": {
"LongName": "Control question",
"Description": "flag whether a control question with presented",
},
"volume": {
"LongName": "fMRI volume total",
"Description": "fMRI volume corresponding to stimulation start",
},
"run_volume": {
"LongName": "fMRI volume run",
"Description": "fMRI volume corresponding to stimulation start in the current run",
},
"run": {
"LongName": "Run in Sequence",
"Description": "order of run in sequence ",
},
"run_id": {
"LongName": "Trial ID in Run",
"Description": "ID of trial sequence for this run",
},
"stim": {
"LongName": "Stimulation file",
"Description": "stimulus file name",
},
"delay": {
"LongName": "inter-stimulus interval",
"Description": "inter-stimulus interval in seconds",
"Units": "seconds"
},
"trigger_ts": {
"LongName": "Trigger time stamp",
"Description": "time stamp of the corresponding MRI trigger with respect to the start of the experiment in seconds ",
"Units": "seconds"
},
"genre": {
"LongName": "Genre",
"Description": "Indicator of the genre of the musical stimulus",
"Levels": {
"country": "Country music",
"symphonic": "Symphonic music",
"metal": "metal music",
"ambient": "ambient music",
"crocknroll": "rocknroll music"
}
}
}
For anatomy I have the following image series.
SeriesNumber, Protocol, currently assigned modality, whether currently converted or ignored for conversion:
(101, 'SmartBrain_32channel', None, 'ignored'), (102, 'SmartBrain_32channel', None, 'ignored'), (103, 'Patient Aligned MPR AWPLAN_SMARTPLAN_TYPE_BRAIN', None, 'ignored'), (201, 'B1_calibration_brain', None, 'ignored'), (202, 'B1_calibration_brain', None, 'ignored'), (301, 'Ref_Head_32', None, 'ignored'), (401, 'sT1W_3D_TFE_TR2300_TI900_0.7iso_FS', 't1w', 'converted'), (501, 'VEN_BOLD_HR_32chSHC', 'swi', 'converted'), (502, 'VVEN_BOLD_HR_32chSHC SENSE', 'swi', 'ignored'), (601, 'sT2W_3D_TSE_32chSHC_0.7iso', 't2w', 'converted'), (701, 'DTI_high_2iso', 'dwi', 'ignored'), (702, 'Reg - DTI_high_iso', 'dwi', 'converted'), (703, 'dReg - DTI_high_iso', 'dwi', 'ignored'), (704, 'eReg - DTI_high_iso', 'dwi', 'ignored'), (705, 'faReg - DTI_high_iso', 'dwi', 'ignored'), (706, 'facReg - DTI_high_iso', 'dwi', 'ignored'), (801, 'field map', 'fieldmap', 'ignored')
Questions: Something that is ignored, but should be converted? Which ones should be assigned modality veno and angio?
Suggested content for the dataset_descrption.json
:
{
"Name": "scientific-data-2014-bids",
"BIDSVersion": "TODO",
"DatasetType": "raw",
"License": "CC0",
"Authors": [
"Michael Hanke",
"Florian J. Baumgartner",
"Pierre Ibe",
"Falko R. Kaule",
"Stefan Pollmann",
"Oliver Speck",
"Wolf Zinke",
"Jorg Stadler",
"Richard Dinga",
"Christian Häusler",
"J. Swaroop Guntupalli",
"Michael Casey"
],
"Acknowledgements": "",
"HowToAcknowledge": "Please follow good scientific practice by citing the most appropriate publication(s) describing the aspects of this datasets that were used in a study.",
"Funding": [
"A grant from the German Federal Ministry of Education and Research (BMBF) funded the initial data acquisition as part of the US-German collaboration in computational neuroscience (CRCNS) project: Development of general high-dimensional models of neuronal representation spaces (Haxby / Ramagde / Hanke), co-funded by the BMBF and the US National Science Foundation (BMBF 01GQ1112; NSF 1129855).",
"We acknowledge the support of the Combinatorial NeuroImaging Core Facility at the Leibniz Institute for Neurobiology in Magdeburg.",
"Moreover, development of data sharing technology used for dissemination and management of this dataset is supported by another US-German collaboration grant awarded to Halchenko and Hanke: DataLad: Converging catalogues, warehouses, and deployment logistics into a federated 'data distribution', also co-funded by BMBF (01GQ1411) and NSF (1129855).",
"The German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences provided support for data acquisition hardware and personnel."
],
"EthicsApprovals": [
""
],
"ReferencesAndLinks": [
"http://studyforrest.org",
"https://www.nature.com/articles/sdata20143",
"https://f1000research.com/articles/4-174/v1"
],
"DatasetDOI": "TODO"
}
TODOs:
Note: This is different from the OpenNeuro dataset_description.json
, but that is intended since this dataset includes only data from "phase1".
Current image series for 7T_ad:
(1, 'AAHScout_32ch', None, 'ignored'), (2, 'AAHScout_32ch_MPR', None, 'ignored'), (3, 'b1map_658', None, 'ignored'), (4, 'b1map_658', None, 'ignored'), (5, 'CV_shim_452B', None, 'ignored'), (6, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl', None, 'ignored'), (7, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_PostProc', None, 'ignored'), (8, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_DiCo', None, 'ignored'), (9, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R1', 'bold', 'ignored'), (10, 'MoCoSeries_DiCo', 'bold', 'converted'), (11, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R2', 'bold', 'ignored'), (12, 'MoCoSeries_DiCo', 'bold', 'converted'), (13, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R3', 'bold', 'ignored'), (14, 'MoCoSeries_DiCo', 'bold', 'converted'), (15, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R4', 'bold', 'ignored'), (16, 'MoCoSeries_DiCo', 'bold', 'converted'), (99, 'PhoenixZIPReport', None, 'ignored')
(2, 'AAHScout_32ch_MPR', None, 'ignored'), (1, 'AAHScout_32ch', None, 'ignored'), (3, 'CV_shim_452B', None, 'ignored'), (4, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl', None, 'ignored'), (5, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_PostProc', None, 'ignored'), (7, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R5', 'bold', 'ignored'), (8, 'MoCoSeries_DiCo', 'bold', 'converted'), (9, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R6', 'bold', 'ignored'), (10, 'MoCoSeries_DiCo', 'bold', 'converted'), (11, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R7', 'bold', 'ignored'), (12, 'MoCoSeries_DiCo', 'bold', 'converted'), (13, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R8', 'bold', 'ignored'), (14, 'MoCoSeries_DiCo', 'bold', 'converted'), (99, 'PhoenixZIPReport', None, 'ignored'), (6, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_DiCo', None, 'ignored')
And pandora, @mih :
(2, 'AAHScout_32ch_MPR', None, 'ignored'), (1, 'AAHScout_32ch', None, 'ignored'), (3, 'b1map_658', None, 'ignored'), (4, 'b1map_658', None, 'ignored'), (5, 'CV_shim_452B', None, 'ignored'), (6, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl', None, 'ignored'), (7, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_PostProc', None, 'ignored'), (9, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R1P', 'bold', 'ignored'), (10, 'MoCoSeries_DiCo', 'bold', 'converted'), (11, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R2P', 'bold', 'ignored'), (12, 'MoCoSeries_DiCo', 'bold', 'converted'), (13, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R3P', 'bold', 'ignored'), (14, 'MoCoSeries_DiCo', 'bold', 'converted'), (15, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R4P', 'bold', 'ignored'), (16, 'MoCoSeries_DiCo', 'bold', 'converted'), (17, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R5P', 'bold', 'ignored'), (18, 'MoCoSeries_DiCo', 'bold', 'converted'), (19, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R6P', 'bold', 'ignored'), (20, 'MoCoSeries_DiCo', 'bold', 'converted'), (21, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R7P', 'bold', 'ignored'), (22, 'MoCoSeries_DiCo', 'bold', 'converted'), (23, 'mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R8P', 'bold', 'ignored'), (24, 'MoCoSeries_DiCo', 'bold', 'converted'), (25, 'ToF-3D-multi-slab-0.3iso_FA20_claus', 'angio', 'converted'), (99, 'PhoenixZIPReport', None, 'ignored'), (8, 'mi_ep2d_flashref_psf_160_p3_1.4mm_7p8_36sl_DiCo', None, 'ignored')
I found a trigger mismatch in the converted Physio files from the audiomovie in your test repo (/data/project/studyforrest_phase1/testing/scientific-data-2014-bids/sub-002/func), @bpoldrack.
The test checks whether it finds the same number as triggers as the run had TRs:
check_physio()
{
nvols=$1
shift
for f in $@; do
found_trigger="$(zgrep '^1' "$f" | wc -l)"
assertEquals "Need to find each trigger in the log" "$nvols" "$found_trigger"
done
}
test_physio_movie_runs()
{
count=1
for nvols in 451 441 438 488 462 439 542 338; do
check_physio $nvols *_task-forrestgump_run-0${count}_physio.tsv.gz
count=$(( $count + 1 ))
done
}
It fails for a few subjects in the conversion.
subject 1
ASSERT:Need to find each trigger in the log expected:<451> but was:<434>
ASSERT:Need to find each trigger in the log expected:<441> but was:<424>
ASSERT:Need to find each trigger in the log expected:<438> but was:<421>
ASSERT:Need to find each trigger in the log expected:<488> but was:<466>
ASSERT:Need to find each trigger in the log expected:<462> but was:<440>
ASSERT:Need to find each trigger in the log expected:<439> but was:<422>
ASSERT:Need to find each trigger in the log expected:<542> but was:<520>
ASSERT:Need to find each trigger in the log expected:<338> but was:<321>
subject 2:
ASSERT:Need to find each trigger in the log expected:<451> but was:<433>
ASSERT:Need to find each trigger in the log expected:<441> but was:<424>
ASSERT:Need to find each trigger in the log expected:<438> but was:<421>
ASSERT:Need to find each trigger in the log expected:<488> but was:<466>
ASSERT:Need to find each trigger in the log expected:<462> but was:<440>
ASSERT:Need to find each trigger in the log expected:<439> but was:<423>
ASSERT:Need to find each trigger in the log expected:<542> but was:<520>
ASSERT:Need to find each trigger in the log expected:<338> but was:<322>
subject 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 are good!
subject 18: (only fails with one trigger)
ASSERT:Need to find each trigger in the log expected:<451> but was:<435>
subject 19 and 20 are good.
It does not fail for the OpenNeuro dataset.
Reminder to add the scanner acquisition protocols to sourcedata/acquisition_protocols
in the converted BIDS dataset.
Doing the same assertion for the pandora data yields mismatches: subject 1, subject 2
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
subject 18
ASSERT:Need to find each trigger in the log expected:<153> but was:<146>
@bpoldrack
Here is the BIDS convention for the <index>
value
<index>
- a nonnegative integer, possibly prefixed with arbitrary number of 0s for consistent indentation, for example, it is 01 in run-01 following run-specification.
Re physio trigger mismatches:
Noting observations for now.
At least for audiomovie, the failing subjects are exactly the ones with sampling frequency 100, while the passing ones have 200. Didn't check pandora yet.
Scripts for converting the log files into events.tsv and events.json files are in
data/project/studyforrest/gumpdata/scripts/conversion
as reconvert_behavlog_pandora
and reconvert_behavtrials_pandora
. They are used like this:
# for the trial file
./reconvert_behavtrials_pandora \
/data/project/studyforrest/pandora/logs/xy22.trials \
'sub-01_task-avmovie_run-0_events'
# for the log files
./reconvert_behavlog_pandora \
/data/project/studyforrest/pandora/logs/ap75.log \
'sub-01_task-avmovie_run-0_events'
i.e. <script> <log/trial file> <outpath with placeholder for run>
I have a README put together, but it requires updating the file paths. Once we have a sample single subject converted dataset, I can start updating the file paths.
Update from matrix channel:
First of all, it ran through, yeah! I have not yet looked closely into it. Initial glimpse, says there are (somewhat minor issues):
.wavs
and swaroop ended up toplevel instead of underneath stimuli.Those are all "technical" that should be relatively easy to fix and apply those fixes. Everything else I have not yet assessed. So, please poke the actual data, everybody!
Here it is: /data/project/studyforrest_phase1/phase1-bids.
ping @mih
It would be good, if this issue gets a checklist to capture what has been looked at and for which conversion attempt. Otherwise it will be rather hard to come to an end.
@mih
It would be good, if this issue gets a checklist to capture what has been looked at and for which conversion attempt. Otherwise it will be rather hard to come to an end.
Edited first post.
Here are the issues I've gathered so far for the recent conversion.
re Missing data types:
These are files that were described by the old README that I haven't managed to find in the newly converted dataset. Some of them are likely elsewhere and don't belong in this dataset, but I wanted to list them just in case.
demographics.csv
file with "participants' responses to a questionnaire on demographic information, musical preference and background, as well as familiarity with the "Forrest Gump" movie"german_audio_description.csv
scenes.csv
with start and end time for all 198 scenes in the presented movie cut and if the scene takes place indoors or outdoors.re BIDS compliance:
auditoryperception
session/task should be one level down underneath the func
dirswi
images:
swi
directory.phase
vs mag
images. OpenNeuro did this with acq-pha
and acq-mag
.swi/*
should be added to the .bidsignore
filetask-*_acq-*_bold.json
) have a few leftover TODOs:
"CogAtlasID": "TODO",
"TaskName": "TODO: full task name for auditoryperception",
*.wav
files and swaroop
should go into a stimuli/
directory (you already are aware of this)studyspec.json
file is ending up in the top level BIDS datasetrecording
label for the card/resp files
*_recording-cardresp-100_physio.json
-> *_recording-cardresp100_physio.json
*_recording-cardresp-200_physio.json
-> *_recording-cardresp200_physio.json
defacemask
images need a mod
label
sub-*_ses-forrestgump_run-*_T1w_defacemask.nii.gz
-> sub-*_ses-forrestgump_run-*_mod-T1w_defacemask.nii.gz
sub-*_ses-forrestgump_run-*_T2w_defacemask.nii.gz
-> sub-*_ses-forrestgump_run-*_mod-T2w_defacemask.nii.gz
defacemask
images need a mod
label and should be added to the .bidsignore
file
sub-*_ses-forrestgump_run-*_dwi_defacemask.nii.gz
-> sub-*_ses-forrestgump_run-*_mod-dwi_defacemask.nii.gz
sub-*/*/*/*mod-dwi_defacemask*
should then be added to the .bidsignore
fileThanks, @loj !
- fieldmaps
We decided to not include them. Still the case, @mih?
- moco
== dico
- angio
Thx, need to investigate. Not intended.
- acquisition protocols (https://github.com/psychoinformatics-de/studyforrest-data/issues/29#issuecomment-829230280)
True. Simply forgot to convert the toplevel specs of the raw datasets ;-) Added.
- demographics data:
demographics.csv
file with "participants' responses to a questionnaire on demographic information, musical preference and background, as well as familiarity with the "Forrest Gump" movie"- audio-description transcript:
german_audio_description.csv
- movie scenes:
scenes.csv
with start and end time for all 198 scenes in the presented movie cut and if the scene takes place indoors or outdoors.
@mih : Does that stuff exist in any other place and/or shape other than anondata
?
- Derivative data:
- linear anatomical alignment
- non-linear anatomical alignment
- aggregate BOLD functional MRI for brain atlas parcellations
- subject template volumes
- group template volumes
Same here, @mih - no idea, where that even comes from. So, considering the point above in addition: I guess I'll go with the original idea of an intermediate raw dataset, that includes all that stuff? Or does every part of that unambiguously belong inside pandora, 7T_ad, anatomy? If so, what goes where?
- the subject level event files for the
auditoryperception
session/task should be one level down underneath thefunc
dir
Yes, there's something wrong with the created name.
swi
images:
- I think these should go under a separate
swi
directory.- We should distinguish which are
phase
vsmag
images. OpenNeuro did this withacq-pha
andacq-mag
.swi/*
should be added to the.bidsignore
file
We decided to go with veno
and acq-pha
/acq-mag
, since swi
is still a BEP and we don't want to break names for users to change to something that isn't settled yet.
- the top level bold json files (
task-*_acq-*_bold.json
) have a few leftover TODOs:
"CogAtlasID": "TODO",
"TaskName": "TODO: full task name for auditoryperception",
Yes, but I have no clue. TaskDescription
should probably also be included. "Someone" needs to provide the truth here!
- a
studyspec.json
file is ending up in the top level BIDS dataset
Good catch. That's a hint, that I seem to have screwed with the versions of the raw dataset. That was an (already fixed) bug - explains the toplevel stimuli, too.
- the validator doesn't like the dash used in the
recording
label for the card/resp files
*_recording-cardresp-100_physio.json
->*_recording-cardresp100_physio.json
*_recording-cardresp-200_physio.json
->*_recording-cardresp200_physio.json
Will do.
- anatomical
defacemask
images need amod
label
sub-*_ses-forrestgump_run-*_T1w_defacemask.nii.gz
->sub-*_ses-forrestgump_run-*_mod-T1w_defacemask.nii.gz
sub-*_ses-forrestgump_run-*_T2w_defacemask.nii.gz
->sub-*_ses-forrestgump_run-*_mod-T2w_defacemask.nii.gz
- dwi
defacemask
images need amod
label and should be added to the.bidsignore
file
sub-*_ses-forrestgump_run-*_dwi_defacemask.nii.gz
->sub-*_ses-forrestgump_run-*_mod-dwi_defacemask.nii.gz
sub-*/*/*/*mod-dwi_defacemask*
should then be added to the.bidsignore
file
Ah - need to fix the deface procedure then.
Thanks, @loj !
- fieldmaps
We decided to not include them. Still the case, @mih?
Hard to say, this issue lumps together so many aspects. If these are the fieldmaps that were acquired together with the DWI data at 3T, then yes. They are invalid.
- moco
== dico
moco = Motion corrected; dico = distortion corrected
So depending which specific data we are talking about that statement is true (moco is precondition for dico), or not (dico is optional).
- angio
Thx, need to investigate. Not intended.
As in "not intended to be there for now"?
- demographics data:
demographics.csv
file with "participants' responses to a questionnaire on demographic information, musical preference and background, as well as familiarity with the "Forrest Gump" movie"- audio-description transcript:
german_audio_description.csv
- movie scenes:
scenes.csv
with start and end time for all 198 scenes in the presented movie cut and if the scene takes place indoors or outdoors.@mih : Does that stuff exist in any other place and/or shape other than
anondata
?
demographics.csv is an original file that contains data only available on paper.
The two other CSVs are outdated and these old versions are here: https://github.com/psychoinformatics-de/studyforrest-data-annotations/blob/master/old/structure/scenes.csv https://github.com/psychoinformatics-de/studyforrest-data-annotations/blob/master/old/speech/german_audio_description.csv
Derivative data:
linear anatomical alignment
non-linear anatomical alignment
aggregate BOLD functional MRI for brain atlas parcellations
subject template volumes
group template volumes
Same here, @mih - no idea, where that even comes from. So, considering the point above in addition: I guess I'll go with the original idea of an intermediate raw dataset, that includes all that stuff? Or does every part of that unambiguously belong inside pandora, 7T_ad, anatomy? If so, what goes where?
This is described in the data paper under technical validation. These can all be ignore for the raw dataset, because they are the outcome of a computational pipeline (i.e. derivatives). Everything labeled "alignment/volumes" in the list above is in https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms . The aggregate timeseries are in https://github.com/psychoinformatics-de/studyforrest-data-aggregate From my POV, they can stay there (hosted on GIN).
swi
images:
- I think these should go under a separate
swi
directory.- We should distinguish which are
phase
vsmag
images. OpenNeuro did this withacq-pha
andacq-mag
.swi/*
should be added to the.bidsignore
fileWe decided to go with
veno
andacq-pha
/acq-mag
, sinceswi
is still a BEP and we don't want to break names for users to change to something that isn't settled yet.
Either way is fine with me: One is an established non-standard, the other is the anticipation of a standard.
the top level bold json files (
task-*_acq-*_bold.json
) have a few leftover TODOs:
"CogAtlasID": "TODO",
"TaskName": "TODO: full task name for auditoryperception",
Yes, but I have no clue.
TaskDescription
should probably also be included. "Someone" needs to provide the truth here!
As mentioned elsewhere, the task descriptions are in /data/project/studyforrest/anondata/task_key.txt
This is how openfmri used to do it. Complete information on how to look up files is available at openfmri.org https://legacy.openfmri.org/data-organization-old/
I don't think anyone has looked up the IDs for these tasks on http://www.cognitiveatlas.org/ yet.
the validator doesn't like the dash used in the
recording
label for the card/resp files
*_recording-cardresp-100_physio.json
->*_recording-cardresp100_physio.json
*_recording-cardresp-200_physio.json
->*_recording-cardresp200_physio.json
Will do.
Thx, looks OK to me.
There are two possible approach that I can see:
Make the code from 2013 run reproducibly
This should be doable, this was all standard Debian package and the custom code in the datasets. We could create a singularity image that travels back in time. This would be attractive from a perspective of forensic data management. And might get us to a place that matches the OpenNeuro #28 state, but with provenance.
Redo the conversion with modern day tooling
This comes with the danger that files come out differently. One would then need to figure out, how they differ, and maybe even why. Pro: this will give us much better metadata automatically; we can showcase hirni. Con: Lots of work, leading to possibly lots of more work.
Waddayathink @bpoldrack
Checklist for the BIDS dataset:
Version:
01fe519fb76c92dd323c4876b57554ce010928f0