monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Preserve original triple in ingests that use maps #190

Closed kevinschaper closed 1 year ago

kevinschaper commented 2 years ago

We should look through the ingests where we currently use maps (OMIM, String, GO Annotations, ZP?) and see if it makes sense to preserve any parts of the original triple using:

https://w3id.org/biolink/vocab/original_subject https://w3id.org/biolink/vocab/original_predicate https://w3id.org/biolink/vocab/original_object

matentzn commented 2 years ago

Yeah that would be awesome. I think this is key to increase trust in our data products! Thanks for looking at this!

kevinschaper commented 2 years ago

GO Ingest: being updated to not need a map STRING ingest: definitely needs this If OMIM is using mim2gene (it looks like it is?), then it should keep the original subject

@matentzn, do you think it makes sense to store a concatenation of the ZP ID set as an original object?

matentzn commented 2 years ago

Hmmmm. Interesting question! Can you bring it up at a data call? I would spontaneously say yes, but.. Not sure would like to hear @cmungall opinion on storing provenance on pre-composed raw data that was internally linked in a post-composed way..

kevinschaper commented 1 year ago

I'm going to resolve this issue, since we're handling this where it's straightforward. The connection between ZP terms and the original post-composition stands on it's own just fine, I hope?

matentzn commented 1 year ago

I have recently updated it - it's tied to ZP releases which means we probably need more frequent ZP releases, and also you have to handle cases where there is no link yet between ZFIN and ZP (there will always be some). I would vote for simply dropping.