Demonstrate how the various stages of a series of workflows could be tracked and triggered by the runtime.
The workflows are defined in a YAML file. This describes the following for each workflow
The main scheduling loop does the following:
To install the environment using poetry, there are a few steps to take.
If Poetry is no installed, run:
pip install poetry
Once poetry is installed, you can run:
poetry install
To use the environment, you can shell into the env:
poetry shell
This package is meant to be used on NMDC approvied compute instances with directories that can be accessed via https and are linked to the microbiomedata.org/data endpoint.
The main python drivers can be found in the nmdc_automation/run_process directory
that contians two processes that require configurations to be supplied.
nmdc_automation/run_process/run_worklfows.py
will automate job claims, job processing, and analysis record and data object submission via the nmdc runtime-api.
To submit a process that will spawn a daemon that will claim, process, and submit all jobs that have not been claimed, cd
in to nmdc_automation/run_process
and run python run_workflows.py watcher --config ../../configs/site_configuration_nersc.toml daemon
, this will watch for omics processing records that have not been claimed and processed.
nmdc_automation/run_process/run_worklfows.py
is designed to take in data files avilable on disk, transform them into NMDC analysis records, and submit them back to the central data store via runtime-api. Currently this process is only suitable for data processed at JGI, but with collaboration, data from other processing centers could be transformed and ingested into NMDC.
To submit the import process, cd
in nmdc_automation/run_process
and run python run_import.py project-import import.tsv ../../configs/import.yaml
, where import.tsv expects the follow format:
omics_id | project_id | directory |
---|---|---|
nmdc:omprc-11-q8b9dh63 | Ga0597031 | /path/to/project/Ga0597031 |