monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
13 stars 1 forks source link

Evidence based Quality Control for the Monarch Graph #487

Open RichardBruskiewich opened 1 year ago

RichardBruskiewich commented 1 year ago

@RichardBruskiewich and @sierra-moxon (cc: @putmantime) to connect on how to harmonize the testing framework from Translator with Monarch knowledge graph quality.

RichardBruskiewich commented 1 year ago

@kevinschaper (cc: @sierra-moxon) do we have a clear statement, somewhere, of our QC objectives for Monarch, in addition to what Corey is implementing? Do we have a list of QC questions we specifically need to answer? I know that these are brought up in various technical meetings and perhaps partly captured in meeting minutes, but I wonder if we have one definitive list somewhere?

One suspects here that KGX graph validation could be the entry point for QC validation of the Monarch graph; that said, a fresh review of KGX versus reasoner-validator validation may be helpful.

An alternate 'quick fix' would be to commission a Plater wrapper for the Monarch graph to be tested (isn't that what we already have in place for the SRI Reference Graph (or whatever we call it these days?). In such a case, we can then run reasoner-validator validation via that TRAPI interface, on the graph.

Another angle here would be to refactor out (or simply use a subset of) some of the core Biolink Model validation code in reasoner-validator, namely, the 'knowledge graph' validation part, for specific use in Monarch QC validation. This part of the code is somewhat agnostic (with perhaps minor caveats to be discerned) about the wider context of its usage in TRAPI Response validation, since the code mainly uses the Biolink Model Toolkit to validation TRAPI Response.Message.KnowledgeGraph JSON. The alignment to KGX validation is subject to further review.

Another more challenging Monarch QC validation approach is to develop a test edge data set to run through the 'one hop' tests of the SRI_Testing framework; however, it is hard to know if the terms of reference of SRI_Testing are totally relevant to Monarch QC (and besides, the SRI_Testing application is lagging in development at the moment?)