Open mih opened 3 years ago
I'm working on building this dataset with subdatasets 7T_ad
, pandorra
, anatomy
for now.
Not entirely clear whether and how we want to reflect the notion of phase1
. Three options:
7T_ad
, pandorra
, anatomy
are its partsstudyforrest-data-raw
(BIDS dataset would then have those three subdatasets under sourcedata
) studyforrest-data-raw
, referencing a subset of its subdatasetsI would lean towards option 2
at the level of converted, anonymized BIDS datasets only as a partial conversion of studyforrest-data-raw (BIDS dataset would then have those three subdatasets under sourcedata)
At this level, then, we could also maintain the data representation as described in papers in a separate branch. #5
At this level, then, we could also maintain the data representation as described in papers in a separate branch. #5
True, but independent on how we reference the raw data at the level of a notion like phase1
.
The "issue" with 2) would be dataset level files like README
, dataset_description.json
and so on. Current approach would be to have them in the raw dataset and use a "copy-converter" for the respective BIDS dataset. If we don't have a phase1-raw
location (1 or 3), where would those things live? They could, of course, be created/added at the BIDS level only. Not sure whether there are things at the phase1
abstraction, where this wouldn't work (b/c anonymization or whatever), though.
Approach 1 would be a special case for phase1
, since other, possibly overlapping superdatasets can't be addressed the same way. So, I lean towards 3) as the most flexible thing that seems likely to generalize as an approach for other subsamples of studyforrest-data-raw
. WDYT, @mih ?
Adapted the scripts/approach to build this.
First trial of building the (sub)datasets finished:
/data/project/studyforrest_phase1/pandora
/data/project/studyforrest_phase1/anatomy
/data/project/studyforrest_phase1/7T_ad
Initial setup of them was done by /data/project/studyforrest_phase1/build-forrest/studyforrest-data-raw-sh
.
Actual data import + spec editing was done by their respective build script in each dataset's code/creation
.
The three datasets pandora
, 7T_ad
and anatomy
require a verification of being what we want them to be. That is: They are supposed to capture all relevant raw data of those "studies" (independent on what should be converted in what context). This requires knowledge of what exactly that means. How do we approach this, @mih?
Additionally, I have now created /data/project/studyforrest_phase1/scientific-data-2014-raw
, that contains those three as subdatasets, since we wanted to aim for publications being the targets for converted datasets. Currently the first conversion run based on this dataset is running in /data/project/studyforrest_phase1/scientific-data-2014-bids
.
Adjusting the specs and checking what may be missing from the converted dataset, will require some kind of target definition to compare to. Is this supposed to be the release_openfmri1
tag in anondata
or is there something else to base the adjustments on, @mih?
Re raw data capturing:
anatomy
looks good as far as I can tell, except for two directories:
Under /data/project/studyforrest/anatomy/data
two subjects have an orig
folder in addition to raw/dicom
. Content looks like a conversion result, but I'm not sure. Does this need to be captured, @mih ?
As for pandora
:
/data/project/studyforrest/pandora
shows logs
, pmc.tar.gz
and swaroop
that aren't currently captured. What are those, @mih and are those things anyhow associated with certain acquisitions?
I have an old TODO note, claiming I need logs
and logs/raw
somehow. Not sure what to make of this distinction.
7T_ad
:
data
folder in /data/project/studyforrest/7T_ad
has behav
subdirectories. I guess, they need to be sucked in.
Do they require some kind of conversion? Are they just copied into the converted dataset? If so, where?
Old note on the issue, that I can't fully decode ATM:
import behav data into first acq per subject from /data/project/studyforrest/7T_ad/ad_data/${sub}* => the same as behav/; Two files are copied to behav/ + two more per subject.
ad_data
. What about that?OK, I made a first push into this project. It contains the majority of the pieces that are needed to build studyforrest-data-raw
or hirni
or whatever the name will be -- in the artifact/
directory.
@bpoldrack can you please post the link to the generated raw datasets?
@mih
/data/project/studyforrest_phase1/pandora
/data/project/studyforrest_phase1/anatomy
/data/project/studyforrest_phase1/7T_ad
Aiming to be a superdataset for targeted subdatasets for each "study". These studies were internally called
7T_ad
pandorra
anatomy
fg_eyegaze_raw
3T_av_et
3T_visloc
These names correspond to folders in the original datastructure on the cluster. They contain the pristine data artifacts and can never be made public, due to data protection regulations.
There are at least two more "raw" datasets (
multires3T
andmultires7T
), but their DICOM data are not readily accessible ATM.