microbiomedata / nmdc_automation

Prototype automation
2 stars 2 forks source link

add linkml validation to unit tests where nmdc-schema database records are made #270

Closed aclum closed 4 days ago

aclum commented 1 month ago

See documentation on how to do this here

We discovered this week that even when using the python classes it is possible that the unit tests make records which would not be accepted by runtime, see https://github.com/microbiomedata/nmdc_automation/issues/267 for details. Using linkml validation is a way to check these records w/o adding additional runtime API dependencies to the unit tests.

Alternatives considered: use json:validate endpoint to validate records.

mbthornton-lbl commented 6 days ago

Validation vs. the NMDC schema is implemented in watch_nmdc.py


import yaml
import linkml.validator
import importlib.resources
from functools import lru_cache

# import the materialized schema - schema version defined in pyproject.toml - and cache it
@lru_cache(maxsize=None)
def _get_nmdc_materialized():
    with importlib.resources.open_text("nmdc_schema", "nmdc_materialized_patterns.yaml") as f:
        return yaml.safe_load(f)

# validation of the nmdc.database before posting to the API:
job_dict = yaml.safe_load(yaml_dumper.dumps(job_database))
            # validate the database object against the schema
            validation_report = linkml.validator.validate(
                job_dict, self.nmdc_materialized, "Database"
            )
            if validation_report.results:
                logger.error(f"Validation error: {validation_report.results[0].message}")
                logger.error(f"job_dict: {job_dict}")
                continue
            else:
                logger.info(f"Database object validated for job {job.opid}")
mbthornton-lbl commented 6 days ago

Validation in run_import.py - basically the same thing:


# validate the database
        logger.info("Validating imported data")
        db_dict = yaml.safe_load(yaml_dumper.dumps(db))
        validation_report = linkml.validator.validate(db_dict, nmdc_materialized)
        if validation_report.results:
            logger.error(f"Validation Failed")
            for result in validation_report.results:
                logger.error(result.message)
            raise Exception("Validation Failed")
        else:
            logger.info("Validation Passed")
mbthornton-lbl commented 6 days ago

Same basic logic is used 1 unit test