Open aclum opened 4 months ago
I believe what is happening is that the repeated subtree starting at "ENVO:01001199 ! terrestrial environmental zone" is (arbitrarily) only placed under "ENVO:01000408 ! environmental zone" and not "ENVO:01000813 ! astronomical body part". The current implementation never repeats subtrees. The real data structure for the ontology is a complex inter-dependent directed acyclic graph and simplifying assumptions were made for the UI.
This is the relevant issue describing the complexity and tradeoffs.
This notebook shows the general approach we decided to take to "treeify" a complex directed acyclic graph. Note that it states:
The first step is to make the directed acyclic graph into a tree. We do this by arbitrarily taking the first parent node from each node as the true parent, and discarding the rest of the parent links
Note the comment from @cmungall in the linked issue:
ontology group defines initial exclusion sets (e.g. astronomical body part)
This makes me think that "ENVO:01000813 ! astronomical body part" should not be a part of the tree presented to the user, perhaps for exactly this reason that it would lead to a duplicated ontology subtree. So that would be my proposed solution to this particular inconsistency.
This is the relevant nmdc-sever code https://github.com/microbiomedata/nmdc-server/blob/main/nmdc_server/ingest/envo.py
Good comments about DAGs vs trees, @jeffbaumes . I misspoke in today's meeting.
I think this should be driven by user stories. What are the searching or browsing patterns we expect?
Would anybody ever search for an intermediate node that has been left out of the exposed hierarchy? If so, then maybe we should omit intermediate nodes.
Would anybody skip the searching step and just browse through the subclasses? If so, then we should probably include all paths to a leaf.
The OBO foundry community is really active in developing tools for this kinds of thing and the obo-community Slack workspace is really active. If none of you want to join that, I can pass on any questions or requests you have, if you share them with me.
My use case from this week was browsing through sub-classes.
The data portal is showing counts for a parent class without showing the child classes they belong to. I was using the data portal yesterday to find the environmental local context terms for NEON soil samples from Colorado (search filter) When I then try to navigate to the term value I see
for a view of no counts for children of
astronomical body part
has a count of 517 but none of the children have any counts, compared tofiat object part
which has a child,environmental zone
which can be used to navigate down to the actual terms. One of the expected values several leaves down is 'area of gramanoid or herbaceous vegetation' Seeastronomical body part
@turbomam confirmed that if he looks at the the ontology json independently the terms should be able to be navigated to viaastronomical body part
. Based on this we believe the issue is on the nmdc-server side rather than with the nmdco-classes.json file.Mark's comments: Ontology Access Kit can be used to check nmdco-classes.json independently of the DataPortal