Open jvendetti opened 1 year ago
I opened the TTL file that we generated in Protege for the ATC ontology. The hierarchy correctly shows the "dolutegravir" class as a sublcass of "Integrase Inhibitors, antiinfectives for systematic use".
If you issue a REST API call to get the children for the "Integrase Inhibitors, antiinfectives for systematic use" class, the API only returns 3 children:
A few additional observations:
rdfs:subClassOf
notations that point to "Integrase Inhibitors, antiinfectives for systematic use" class:
<http://purl.bioontology.org/ontology/ATC/J05AJ01> a owl:Class ;
skos:prefLabel """raltegravir"""@en ;
skos:notation """J05AJ01"""^^xsd:string ;
rdfs:subClassOf <http://purl.bioontology.org/ontology/ATC/J05AJ> ;
<http://purl.bioontology.org/ontology/ATC/ATC_LEVEL> """5"""^^xsd:string ;
umls:cui """C1871526"""^^xsd:string ;
umls:tui """T114"""^^xsd:string ;
umls:tui """T121"""^^xsd:string ;
umls:hasSTY <http://purl.bioontology.org/ontology/STY/T114> ;
umls:hasSTY <http://purl.bioontology.org/ontology/STY/T121> ;
.
http://purl.bioontology.org/ontology/ATC/J05AJ03 a owl:Class ; skos:prefLabel """dolutegravir"""@en ; skos:notation """J05AJ03"""^^xsd:string ; rdfs:subClassOf http://purl.bioontology.org/ontology/ATC/J05AJ ; http://purl.bioontology.org/ontology/ATC/ATC_LEVEL """5"""^^xsd:string ; umls:cui """C3253985"""^^xsd:string ; umls:tui """T109"""^^xsd:string ; umls:tui """T121"""^^xsd:string ; umls:hasSTY http://purl.bioontology.org/ontology/STY/T109 ; umls:hasSTY http://purl.bioontology.org/ontology/STY/T121 ; .
Yet, when looking at the properties for each of these classes in the class endpoint in BioPortal, the `rdfs:subClassOf` relationship of "dolutegravir" is not shown:
https://data.bioontology.org/ontologies/ATC/classes/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FATC%2FJ05AJ01?display=prefLabel,properties&no_links=true&no_context=true
<img width="569" alt="Screen Shot 2022-08-04 at 3 01 11 PM" src="https://user-images.githubusercontent.com/2042070/182960285-6df2bb55-85cf-4b2e-8105-f2150bfe60d1.png">
https://data.bioontology.org/ontologies/ATC/classes/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FATC%2FJ05AJ03?display=prefLabel,properties&no_links=true&no_context=true
<img width="562" alt="Screen Shot 2022-08-04 at 3 01 35 PM" src="https://user-images.githubusercontent.com/2042070/182960311-a0b387ee-cf7d-4e33-abb1-61abed3d4979.png">
I re-processed the ATC ontology, and the subclasses now appear correctly.
Reopening, as we received a follow up message from @piehld enumerating more classes with missing parents:
http://purl.bioontology.org/ontology/ATC/C02LX -> missing_parent -> http://purl.bioontology.org/ontology/ATC/C02L http://purl.bioontology.org/ontology/ATC/D01AE54 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/D01AE http://purl.bioontology.org/ontology/ATC/B06AA03 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/B06AA http://purl.bioontology.org/ontology/ATC/J06A -> missing_parent -> http://purl.bioontology.org/ontology/ATC/J06 http://purl.bioontology.org/ontology/ATC/S03A -> missing_parent -> http://purl.bioontology.org/ontology/ATC/S03 http://purl.bioontology.org/ontology/ATC/A10AE05 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/A10AE http://purl.bioontology.org/ontology/ATC/G03AB06 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/G03AB http://purl.bioontology.org/ontology/ATC/A16AB05 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/A16AB http://purl.bioontology.org/ontology/ATC/L04AA39 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/L04AA http://purl.bioontology.org/ontology/ATC/A02BA51 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/A02BA http://purl.bioontology.org/ontology/ATC/C01BD07 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/C01BD http://purl.bioontology.org/ontology/ATC/A06AC53 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/A06AC http://purl.bioontology.org/ontology/ATC/A01AD06 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/A01AD http://purl.bioontology.org/ontology/ATC/C10AB02 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/C10AB http://purl.bioontology.org/ontology/ATC/M01AE01 -> missing_parent -> http://purl.bioontology.org/ontology/ATC/M01AE
I spot-checked the first class in the list and confirmed that the REST API returned an empty set for the parents:
{ }
Just as a sanity check, I fully reprocessed ATC in production and cleared the caches. After doing so, I spot checked all of the above classes in the BioPortal web application and they all seem to appear as intended in the tree hierarchy. The REST call I listed above also returns parents now.
Before marking this as resolved, I'd like to hear back from Dennis about whether he's seeing any other anomalies.
@piehld reports:
... there's just one more that appears to be missing its parent:
http://purl.bioontology.org/ontology/ATC/A03CB01 –> missing parent: A03CB
A REST API call confirms this:
{ }
So - basically we've processed this ontology three separate times, and each time there appears to be some small set of classes with missing parents. The TTL file still doesn't appear to be the root of the issue. I downloaded the latest version and you can see that Protege is able to construct the tree properly:
There also doesn't seem to be an issue with the owlapi.xrdf intermediary file we generate. it contains the expected subClassOf declaration:
<!-- http://purl.bioontology.org/ontology/UATC/A03CB01 -->
<Class rdf:about="http://purl.bioontology.org/ontology/UATC/A03CB01">
<rdfs:subClassOf rdf:resource="http://purl.bioontology.org/ontology/UATC/A03CB"/>
</Class>
Out of curiosity, I reprocessed ATC in our staging environment and wasn't able to reproduce the issue. At this staging URL, you can see the class correctly positioned in the tree:
Information from @piehld about how they're testing:
... the way I've been checking these is through one of our organization's PyPI packages ...
# Install package
pip install rcsb.utils.chemref
# Download the test script
curl -O https://raw.githubusercontent.com/rcsb/py-rcsb_utils_chemref/master/rcsb/utils/tests-chemref/testAtcProvider.py
# Run the test (will refetch latest CSV and perform some processing)
python3 testAtcProvider.py
In my system (macOS Monterey), the correct install command was:
python3 -m pip install rcsb.utils.chemref
I got an error running the test script:
▲ ~ ▶ python3 testAtcProvider.py
testReadAtcInfo (__main__.AtcProviderTests) ... INFO:rcsb.utils.chemref.AtcProvider:ATC fetch status is True
INFO:rcsb.utils.chemref.AtcProvider:Length of name dictionary 6440
INFO:rcsb.utils.chemref.AtcProvider:Length of parent dictionary 6440
INFO:rcsb.utils.chemref.AtcProvider:ATC cache status True data length 6567 columns ['Class ID', 'Preferred Label', 'Synonyms', 'Definitions', 'Obsolete', 'CUI', 'Semantic Types', 'Parents', 'ATC LEVEL', 'Is Drug Class', 'Semantic type UMLS property'] names 6440 parents 6440
INFO:rcsb.utils.chemref.AtcProvider:nD 6440 pD 6440
ERROR:rcsb.utils.chemref.AtcProvider:Failing for 'A03CB01' with ''
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/rcsb/utils/chemref/AtcProvider.py", line 110, in getIdLineage
pt = self.__atcD["parents"][pt]
KeyError: ''
INFO:root:length of tree list 6441
ok
----------------------------------------------------------------------
Ran 1 test in 0.230s
OK
Hi @mdorf, thanks for all your work on this. Yes, that's the error that indicated to me that A03CB01 is still missing its parent.
I downloaded the ATC TTL file from our Staging server, where ATC appears to have parsed properly. I created a new submission of ATC in production using that file and kicked off its processing. After the processing completed and our internal caches cleared, the Python script yielded this result:
▲ ~ ▶ python3 testAtcProvider.py
testReadAtcInfo (__main__.AtcProviderTests) ... INFO:rcsb.utils.chemref.AtcProvider:ATC fetch status is True
INFO:rcsb.utils.chemref.AtcProvider:Length of name dictionary 6440
INFO:rcsb.utils.chemref.AtcProvider:Length of parent dictionary 6440
INFO:rcsb.utils.chemref.AtcProvider:ATC cache status True data length 6567 columns ['Class ID', 'Preferred Label', 'Synonyms', 'Definitions', 'Obsolete', 'CUI', 'Semantic Types', 'Parents', 'ATC LEVEL', 'Is Drug Class', 'Semantic type UMLS property'] names 6440 parents 6440
INFO:rcsb.utils.chemref.AtcProvider:nD 6440 pD 6440
INFO:root:length of tree list 6440
ok
----------------------------------------------------------------------
Ran 1 test in 0.251s
OK
▲ ~ ▶
Does that mean it ran successfully, or could there be other cases that the script isn't catching?
@mdorf This looks great, thank you! I am getting the same result too.
I loaded the version of ATC.ttl
that resulted in missing parents into our Staging server and re-processed it there. I then pointed the Python script (testAtcProvider.py
) to the staging server by hacking my local copy of its underlying library, AtcProvider.py
. The result appears to be positive:
△ ~ ▶ python3 testAtcProvider.py
testReadAtcInfo (__main__.AtcProviderTests) ... INFO:rcsb.utils.chemref.AtcProvider:ATC fetch status is True
INFO:rcsb.utils.chemref.AtcProvider:Length of name dictionary 6440
INFO:rcsb.utils.chemref.AtcProvider:Length of parent dictionary 6440
INFO:rcsb.utils.chemref.AtcProvider:ATC cache status True data length 6567 columns ['Class ID', 'Preferred Label', 'Synonyms', 'Definitions', 'Obsolete', 'CUI', 'Semantic Types', 'Parents', 'ATC LEVEL', 'Is Drug Class', 'Semantic type UMLS property'] names 6440 parents 6440
INFO:rcsb.utils.chemref.AtcProvider:nD 6440 pD 6440
INFO:root:length of tree list 6440
ok
----------------------------------------------------------------------
Ran 1 test in 0.218s
OK
Received a report from end user @piehld that the newly generated CSV file for the ATC ontology (formerly known as UATC) has some empty Parents fields for classes that used to have parents:
The following examples were provided of classes with missing parents:
For debugging purposes, I checked the first term in the above list in the UMLS Metathesaurus Browser. The hierarchy pane indicates parents should be present:
I also did a basic sanity check and looked at the parsing log file for ATC. The latest parsing run shows no errors in the log file.