gyorilab / mira

MIRA modeling framework
BSD 2-Clause "Simplified" License
9 stars 7 forks source link

Candidates for DKG extension #49

Closed bgyori closed 1 year ago

bgyori commented 2 years ago

While examining relevant models and data sets, I tried to identify specific entries or subsets of ontologies that could be included in the DKG. NCIT is generally very useful and relevant, and contains (sometimes seemingly exclusively) terms that are needed. But I would say is too large and diverse to include as a whole. Below I'm using OLS link for tree browsing.

NCIT

Not NCIT

cthoyt commented 1 year ago

So the first thing I'm checking into is using the robot extract command to get subtrees out of NCIT (docs: http://robot.obolibrary.org/extract). I'm trying

robot extract \
    -I http://purl.obolibrary.org/obo/ncit.owl \
    --method MIREOT \
    --output /Users/cthoyt/Desktop/test.owl \
    --branch-from-term "obo:NCIT_C17005" \
    --branch-from-term "obo:NCIT_C25636"

right now and it's making a nice subhierarchy. There might be some chain of commands that can add the parents back to the root, but I don't think this is necesassary.

It appears --upper-term doesn't work without also a --lower-term, so don't use these.

Ideally it's also possible to do multiple upper level terms in a simpler commend, too. This should work somehow with --branch-from-terms but I am not sure what the syntax is. For now, multiple --branch-from-term is fine.

The step after this is to add a custom step to the DKG build that lets you specify some local ontology files instead of looking them up via bioregistry prefix.