Code to run hra-workflows locally or using Slurm.
Copy env.example.sh
into env.sh
and then update the configuration with your own settings.
If using singularity rather than docker it is highly recommended that you configure the SIF_CACHE_DIR
option and run ./scripts/00x-build-containers.sh
to prebuild the sif containers for faster runs.
Another script that only needs to be run once is 05x-download-models.sh
. This script will download the model files that were to large to embed directly in the algorithm's container file.
The next step depends on whether you are running the code locally or in an environment using Slurm.
Run run.sh
. This script will run the entire pipeline from start to finish. Note that the annotation steps will run sequentially for all datasets and may therefore take a lot of time when processing large amounts of datasets.
Running on Slurm requires some manual work due to difficulties of running nested singularity containers. Start by running x-run.sh
. This script will schedule a job that will run every step up to but not including annotating the datasets. Once the x-run.sh
job has finished the annotation step can be started using ./scripts/30x-annotate.sh
. After all annotations have finished the 40+
scripts have to be run manually. ./scripts/01x-start-container.sh
is provided as a utility to enter an environment where 40+
scripts can be run with all dependencies satisfied.
Adding a new dataset handler requires two steps:
A dataset handler must implement and export the DatasetHandler interface from it's index.js file. The interface includes a listing generator, downloader, and job generator. It can be useful to refer to one of the existing implementations such as CellXGene or GTEx when creating a new handler.
The index.js
file must be placed in a src/handler-name/
folder to allow the handler to be properly loaded.
The new handler must then be registered by adding it's name in DEFAULT_DATASET_HANDLERS. Alternatively handlers can be specified using the DATASET_HANDLERS
environment variable.