Closed aclum closed 4 days ago
Validation vs. the NMDC schema is implemented in watch_nmdc.py
import yaml
import linkml.validator
import importlib.resources
from functools import lru_cache
# import the materialized schema - schema version defined in pyproject.toml - and cache it
@lru_cache(maxsize=None)
def _get_nmdc_materialized():
with importlib.resources.open_text("nmdc_schema", "nmdc_materialized_patterns.yaml") as f:
return yaml.safe_load(f)
# validation of the nmdc.database before posting to the API:
job_dict = yaml.safe_load(yaml_dumper.dumps(job_database))
# validate the database object against the schema
validation_report = linkml.validator.validate(
job_dict, self.nmdc_materialized, "Database"
)
if validation_report.results:
logger.error(f"Validation error: {validation_report.results[0].message}")
logger.error(f"job_dict: {job_dict}")
continue
else:
logger.info(f"Database object validated for job {job.opid}")
Validation in run_import.py
- basically the same thing:
# validate the database
logger.info("Validating imported data")
db_dict = yaml.safe_load(yaml_dumper.dumps(db))
validation_report = linkml.validator.validate(db_dict, nmdc_materialized)
if validation_report.results:
logger.error(f"Validation Failed")
for result in validation_report.results:
logger.error(result.message)
raise Exception("Validation Failed")
else:
logger.info("Validation Passed")
Same basic logic is used 1 unit test
See documentation on how to do this here
We discovered this week that even when using the python classes it is possible that the unit tests make records which would not be accepted by runtime, see https://github.com/microbiomedata/nmdc_automation/issues/267 for details. Using linkml validation is a way to check these records w/o adding additional runtime API dependencies to the unit tests.
Alternatives considered: use json:validate endpoint to validate records.