The programs bundled in this repository intend to solve the problem of automatically retrieving Biosample metadata records for a given study submitted to NMDC through the NMDC Submission Portal, and converting the metadata into Excel spreadsheets that are accepted by DOE user facilities.
There are two components (of MUTTs) to keep in mind when trying to use this application -
JSON header (sometimes also called mapper) configuration file
etl.py The command line application that can facilitate the conversion of metadata from the Submission Portal into user facility formats by consuming the above two files as inputs.
git clone https://github.com/microbiomedata/metadata-for-user-facility-template-transformations.git
poetry install
You need to obtain your NMDC Data and Submission Portal API Access Token and copy it over into your .env
file, and associate it with the DATA_PORTAL_REFRESH_TOKEN
environment variable.
.env
file and copy the Refresh Token like DATA_PORTAL_REFRESH_TOKEN={refresh_token_value}
Run etl.py
with options as follows:
metadata-for-user-facility-template-transformations git:(main) ✗ poetry run python etl.py --help
Usage: etl.py [OPTIONS]
Command-line interface for creating a spreadsheet based on metadata records.
:param submission: The ID of the metadata submission.
:param user_facility: The user facility to retrieve data from.
:param header: True if the headers should be included, False otherwise.
:param mapper: Path to the JSON mapper specifying column mappings.
:param unique_field: Unique field to identify the metadata records.
:param output: Path to the output XLSX file.
Options:
-s, --submission TEXT Metadata submission id. [required]
-u, --user-facility TEXT User facility to send data to. [required]
-h, --header / --no-header [default: no-header]
-m, --mapper PATH Path to user facility specific JSON file.
[required]
-uf, --unique-field TEXT Unique field to identify the metadata records.
[required]
-o, --output TEXT Path to result output XLSX file. [required]
--help Show this message and exit.
Example - JGI/JGI_MG
poetry run python etl.py --submission {UUID of the target submission} --unique_field samp_name --user-facility jgi_mg --mapper input-files/jgi_mg_header.json --output file-name_jgi.xlsx
Example - EMSL
poetry run python etl.py --submission {UUID of the target submission} --user-facility emsl --mapper input-files/emsl_header.json --header --unique-field samp_name --output file-name_emsl.xlsx