ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

MedDRA concept tree doesn't display #202

Open graybeal opened 3 years ago

graybeal commented 3 years ago

Since UMLS 2020AB upload, the MedDRA ontology isn't displaying its concept tree, the issue being that no root concepts were identified. Misha and I have done some early investigation, and learned the following.

No other UMLS concept trees in production have this issue (yay); all the other ontologies not display a tree are "flat".

We confirmed in the Metathesaurus browser that 2020AB MedDRA does include the root classes and has them at the top of the hierarchy. (Well, only checked the first one, but that seemed like enough.)

In staging (version 2020AA) MedDRA displays correctly, with many roots. [1,2] The first root has 17 concepts under it. The first root concept ID is http://purl.bioontology.org/ontology/MEDDRA/10005329, the first ID below it is http://purl.bioontology.org/ontology/MEDDRA/10002086 and has an annotation as a subClassOf the first root concept ID (Blood and lymphatic system disorders, 10005329). "Inverse of SIB" relations, which identifies all the siblings of this class, is populated. The first root concept (10005329) shows it is a subclass of Owl#Thing.

In production, while the top tree can not be viewed, the concepts are visible through searching. The 10005329 concept shows no children or parent, does not show it is a subclass of Owl#Thing, and does not show SIB relations. [3] The first concept under it in the staging case—10002086—does show a small tree under it. [4] It does not show itself as a subclass of the first root concept ID.

A download of the 2020AA and 2020AB .ttl files were taken from production logs (respectively before and after the adjustment Alex made to reprocess everything). These were compared; there are over 37000 changes, but at least some of these are ordering changes. Both show approximately the same number of internal subClass relations (~ 200 more in 2020AB). [5] However, while 2020AA has the expected 17 subClassOf declarations with the first root class ID above (10005329) as the object, there are zero declarations in 2020AB that meet that criterion. And whereas in 2020AA the concept 10002086 has that subclass declaration [6], 2020AB doesn't [7]. Opening the 2020AB .ttl file in Protege confirms this subclass relationship is missing for 10002086. [8]

Misha opened the 2020AB .xrdf file in Protege, and found the hierarchies are maintained in that document. Not sure how this can be… >> Update: the timestamp on the owlapi.xrdf file did not get updated when we re-ran the parsing:

-rw-rw-r--. 1 ncbo-deployer ncbo 246050710 Jan 15 15:23 MEDDRA.ttl
-rw-rw-r--. 1 ncbo-deployer ncbo 254555492 Jan  6 22:11 owlapi.xrdf

Not sure if it might cause other issues but it explains the discrepancy Misha found with Protege.

Looking at the UMLS Statistics page for MedDRA, we can see there are 37543 CHD (child) relations, which is within 300 of the total number of subClassOf relations, including those to terms outside MedDRA; it makes sense that we'd convert CHD relations to subClassOf, so that seems consistent.

What is not clear is (1) what is keeping many subClassOf relations from showing up in the .ttl, and (2) why SIB relations are not being captured in BioPortal for this ontology's terms.

[1] Staging root class (10005329) Screen Shot 2021-01-21 at 8.40.45 PM.png [2] Staging subclass (10002086) Screen Shot 2021-01-21 at 8.40.57 PM.png [3] Production 'root class' (10005329) Screen Shot 2021-01-21 at 8.50.55 PM.png [4] Production 'subclass' (10002086) Screen Shot 2021-01-21 at 8.39.59 PM.png [5] Comparison of internal subclass counts Screen Shot 2021-01-21 at 9.04.30 PM.png [6] 2020AA concept 10005329 with subclass declaration Screen Shot 2021-01-21 at 9.19.33 PM.png [7] 2020AA concept 10005329 without subclass declaration Screen Shot 2021-01-21 at 9.20.54 PM.png [8] Protege doesn't see subclass declaration either Screen Shot 2021-01-21 at 9.10.41 PM.png