phenoscape / TraitFest-2023

Main repository for information advertising and documenting the 2023 SCATE TraitFest
Creative Commons Zero v1.0 Universal
3 stars 0 forks source link

R package for semantic similarity calculation using non-KB and offline resources #20

Open hlapp opened 1 year ago

hlapp commented 1 year ago

Create an R package for computing pairwise and profile semantic similarity metrics similar to the Rphenoscape package, but instead of using the Phenoscape KB API to obtain subsumers of input terms, use a more general online service (in particular, Ubergraph / Relationgraph queries using SPARQL), and ultimately offline sources (in particular, downloaded relation graph edge tables, and Ubergraph in the form of SemSQL table downloads).

Rphenoscape includes methods for calculating a variety of both pairwise and profile semantic similarity metrics, but relies on the Phenoscape KB API to obtain subsumers (and, for IC-based metrics, term frequencies). This limits these capabilities to the ontologies that are part of the current KB build. Adding new ontologies to the KB build is both a non-trivial undertaking, and outside of the control of researchers outside of the SCATE / Phenoscape project. The goal here is to make the semantic similarity algorithms much more easily available to existing or future new ontologies that aren't directly used within Phenoscape.

wdahdul commented 1 year ago

Would modified mutual exclusivity functions be appropriate to add here? Currently mutual exclusivity is returned based on evidence from studies in the KB but could potential query a user defined dataset. The optional quality_opposites parameter for mutually exclusivity will be user specified, so opening the overall function to other datasets might make sense.

hlapp commented 1 year ago

@wdahdul it's not impossible to add this in some way to ubeRsim, but we'd first need to understand a lot better how we would want to define this.

More specifically, for Rphenoscape mutual exclusivity is determined for phenotypes in the KB. Phenotypes could be in Ubergraph, but presumably only from pre-composed named classes in requisite phenotype ontologies (such as HPO or MP), and not as the anonymous class expressions we use for annotating natural trait data.

Though perhaps it's time to create a pre-composed phenotype ontology for natural trait data as well. Or try to add our phenotypes to one that's already being developed (OBA? UBERPHENO?). (Thoughts @balhoff ?)

hlapp commented 1 year ago

I'm actually moving this to phenoscape/ubeRsim#2. Additional comments please either to that issue, or create a new one on the ubeRsim issue tracker.