monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Generate new relation graph tsv from Phenio owl #286

Closed kevinschaper closed 1 year ago

kevinschaper commented 2 years ago

Using the tsv output branch of relation-graph (https://github.com/balhoff/relation-graph/tree/issue-25)

relation-graph/target/universal/stage/bin/relation-graph --ontology-file monarch-ontology-final.owl --redundant-output-file monarch-ontology-relations-redundant.tsv --non-redundant-output-file monarch-ontology-relations-non-redundant.tsv --mode tsv --reflexive-subclasses true --equivalence-as-subclass true --property "rdfs:subClassOf"

We could keep going with a one-time, or integrate this into the pipeline. We may also want to see about whether this can be generated into the Phenio build.

kevinschaper commented 2 years ago

Command line for phenio

relation-graph/target/universal/stage/bin/relation-graph --ontology-file phenio.owl --redundant-output-file phenio-relations-redundant.tsv --non-redundant-output-file phenio-relations-non-redundant.tsv --mode tsv --reflexive-subclasses true --equivalence-as-subclass true --property "rdfs:subClassOf"
caufieldjh commented 2 years ago

I can integrate this upstream and provide through phenio releases, unless there's a reason they'd need to be rebuilt more often.

kevinschaper commented 2 years ago

Ooh, yes please!

Are you planning to add the kgx conversion too?

caufieldjh commented 2 years ago

That's already available as KG-Phenio - https://kg-hub.berkeleybop.io/kg-phenio/20220607/kg-phenio.tar.gz KG-Phenio ingests phenio and does preprocessing on it, but that preprocessing will soon be moved up into phenio so the two releases will be more similar.

matentzn commented 2 years ago

Closure file could be big, especially if you include other modalities like part of. How big is the KGX TSV as far as you can see?

caufieldjh commented 2 years ago

Edges and nodes are 101 MB and 147 MB, respectively

matentzn commented 2 years ago

thats.. small! Thanks

kevinschaper commented 2 years ago

The relation graph closure tsv is 650MB - I assume it'll get bigger if it's traversing more than just subclass_of

caufieldjh commented 2 years ago

Yeah, part_of alone will add at least 10% more edges, and I see 52 other predicate types beyond subclass_of, at least in terms of Biolink. Entirely doable at the current size I think

kevinschaper commented 1 year ago

I think this issue is safe to close, and we'll start getting the relations file from the semantic sql version of phenio in issue #369

caufieldjh commented 1 year ago

@kevinschaper You don't even need to get it from semsql - the relation graph is now distributed along with PHENIO itself