monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Use alliance schema formatted data in gene expression (site) ingest #202

Closed RichardBruskiewich closed 2 years ago

RichardBruskiewich commented 2 years ago

We moved over to using data from EXPRESSION files in https://fms.alliancegenome.org/api/snapshot/release/5.1.0

kevinschaper commented 2 years ago

It looks like we're hitting a validation error on stage names going into the stage qualifier field.

I'm not sure if we should leave out the stage names and simply leave the stage qualifier blank in those cases? (and file an issue to get that sorted out in the future) - or is a name in the stage_qualifier valid in biolink and the pydantic validation is being too strict?

ERROR:gene_to_expression:Alliance gene expression ingest parsing exception for data row:
        '{'assay': 'MMO:0000658', 'crossReference': {'id': 'MGI:3507981', 'pages': ['gene/expression/annotation/detail']}, 'dateAssigned': '2018-07-18T13:27:43-04:00', 'evidence': {'crossReference': {'id': 'MGI:3046355', 'pages': ['reference']}, 'publicationId': 'PMID:15618518'}, 'geneId': 'MGI:101877', 'whenExpressed': {'stageName': 'TS27', 'stageUberonSlimTerm': {'uberonTerm': 'post embryonic, pre-adult'}}, 'whereExpressed': {'anatomicalStructureTermId': 'EMAPA:31858', 'anatomicalStructureUberonSlimTermIds': [{'uberonTerm': 'Other'}], 'whereExpressedStatement': 'head'}}'
3 validation errors for GeneToExpressionSiteAssociation
stage_qualifier
  string does not match regex "^[a-zA-Z_]?[a-zA-Z_0-9.-]*:([A-Za-z0-9_][A-Za-z0-9_.-]*[A-Za-z0-9./\(\)\-><_:;]*)?$" (type=value_error.str.regex; pattern=^[a-zA-Z_]?[a-zA-Z_0-9.-]*:([A-Za-z0-9_][A-Za-z0-9_.-]*[A-Za-z0-9./\(\)\-><_:;]*)?$)
stage_qualifier
  string does not match regex "^(http|ftp)" (type=value_error.str.regex; pattern=^(http|ftp))
stage_qualifier
  instance of LifeStage, tuple or dict expected (type=type_error.dataclass; class_name=LifeStage)