monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 2 forks source link

How does ingestion work for entity that only has an identifier? #184

Closed wrosko closed 2 years ago

wrosko commented 2 years ago

Hi there,

Say we want to incorporate relationships from a new data source and we only know the entity CUIs, not necessarily all other details (attributes etc.). If we create nodes with NamedEntity(id="UMLS:C0121434") without specifying other stuff, would kgx/koza be able to recognize and assign to the correct node if a node with the same id exists? And at what point in the pipeline would this occur?

kevinschaper commented 2 years ago

Hey Wade!

I chatted with @sierra-moxon and our best guess was that if anything handled this, it would be the clique merge in kgx, and it would only handle if if you also brought in the node with the with more detail for it to be merged with. I tried a little experiment based on a clique merge test in kgx, and it looks like even in that case it looks like it took the more general category in the merge.

We're going in the direction of removing ID-only nodes from our association ingests and using downstream QC checks to look for extra nodes that we need to consciously bring in, but I can see how that might overkill for some use cases. You could look at using Biolink Model Toolkit to guess at categories based on ID prefixes, plenty would be ambiguous, but it might be a start.

wrosko commented 2 years ago

Thank for the reply Kevin!