These scripts generate the KEGG/COG/Pfam aggregations that are used for search.
A container hosted on Spin runs the agg.sh
script, which performs the aggregations periodically (once every 4 hours, by default).
[!NOTE] The container image is hosted here.
Here's how you can set up a local development environment:
Unless otherwise specified, all commands below are designed to be run from the root directory of the repository.
[!NOTE] These instructions do not cover the process of setting up a local MongoDB server or getting access to the NERSC filesystem.
python -m venv ./.venv
source ./.venv/bin/activate
pip install -r requirements.txt
We use pytest as our test framework.
Here's how you can run the tests:
Unless otherwise specified, all commands below are designed to be run from the root directory of the repository.
pytest
Here's how you can build a new version of the container image and push it to the GitHub Container Registry:
v{major}.{minor}.{patch}
" format (e.g. "v1.2.3
").Taking a long time? Check the "Actions" tab on GitHub to see the status of the GitHub Actions workflow that builds the image.
Now that the container image is hosted there, you can configure a Spin workload to run it.
MONGO_URL
: Full Mongo URI for connecting to the Mongo database (no default)LOG_FILE
: Path to file to which logs will be appended (Default: /tmp/agg.log
)POLL_TIME
: Number of seconds to sleep between each run (Default: 14400
, which is 4 hours)NMDC_BASE_URL
: Base URL to access the data (Default: https://data.microbiomedata.org/data
)NMDC_BASE_PATH
: Base path to the data on disk (Default: /global/cfs/cdirs/m3408/results
)