Retroactively annotate the phenotypic data of a large number of BIDS datasets at once.
At the moment this focuses on datasets with MRI data only.
This takes as input the datasets that are available on the datalad superdataset.
This may not reflect the latest version of all of the datasets on openneuro and openneuro derivatives.
OpenNeuro datasets:
Number of datasets: 790 with 34479 subjects including:
OpenNeuro derivatives datasets:
Number of datasets: 258 with 10582 subjects including:
openneuro can be installed via (this will take a while):
make openneuro
openneuro derivatives can be installed via (this will take a while):
make openneuro-derivatives
WIP
run list_openneuro_dependencies.py
and it will create TSV file with basic info for each dataset and its derivatives.
sourcedata/raw
for the openneuro-derivatives)Run list_participants_tsv_columns.py.py
to get a listing of all the columns present in all the participants.tsv files
and a list of all the unique columns across participants.tsv files.
Run list_participants_tsv_levels.py
to also get a listing of all the levels
in all the columns present in all the participants.tsv files.
The OpenNeuro-JSONLD org has augmented openneuro datasets. To clone these effectively, you can use the below command:
It uses the GH CLI: https://cli.github.com/manual/installation
And make sure to be logged into the CLI
gh repo list OpenNeuroDatasets-JSONLD --fork -L 500 | awk '{print $1}' | sed 's/OpenNeuroDatasets-JSONLD\///g' | parallel -j 6 git clone git@github.com:OpenNeuroDatasets-JSONLD/{}
bagel-cli
on bulk annotated dataThe following scripts are used:
extract_bids_dataset_name.py
add_description.py
run_bagel_cli.sh
parallel_bagel.sh
(Optional) create a new Python environment with python -m venv my_env
.
Activate your python environment with source ./my_env/bin/activate
Install the dependencies with pip install -r requirements.txt
Get the latest version of the bagel-cli
from Docker Hub: docker pull neurobagel/bagelcli:latest
Create a directory called inputs
in the repository root that contains all the datasets that will be processed with the CLI.
To run the CLI in parallel across the datasets in inputs/
, double check that the directory paths used by parallel_bagel.sh
and run_bagel_cli.sh
are correct, then run:
./parallel_bagel.sh