Closed alexskr closed 9 months ago
metrics is calculated by owlapi_wrapper for all ontologies except for UMLS. Parsing process falls back to using ruby/sparql code for calculating metrics which is doesn't work well with AllegroGraph.
https://github.com/ncbo/ontologies_linked_data/blob/ee0013f0ee23876076bff9d9258b46371ec3b248/lib/ontologies_linked_data/models/ontology_submission.rb#L453-L458
logs state that ontology parsing process for UMLS ontologies skips OWLAPI parse but repository directory contains owlapi.xrdf file which indicates that owlapi wrapper was invoked.
I, [2024-01-20T17:52:00.002129 #19673] INFO -- : ["Starting to process http://data.bioontology.org/ontologies/SNOMEDCT/submissions/28"]
I, [2024-01-20T17:52:00.004475 #19673] INFO -- : ["Starting to process SNOMEDCT/submissions/28"]
I, [2024-01-20T17:52:00.230010 #19673] INFO -- : ["Using UMLS turtle file found, skipping OWLAPI parse"]
owlapi_wrapper is invoked when new UMLS ontology submissions are created so we should use that metrics instead of the metrics generated by https://github.com/ncbo/ontologies_linked_data/blob/ee0013f0ee23876076bff9d9258b46371ec3b248/lib/ontologies_linked_data/metrics/metrics.rb#L51
I wrote a couple of simple unit tests in the owlapi_wrapper project in my local dev environment to test metrics generation, e.g.:
@Test
public void parse_OntologySNOMEDCT() throws Exception {
ParserInvocation pi = new ParserInvocation("./src/test/resources/repo/input/snomedct",
"./src/test/resources/repo/output/snomedct", "SNOMEDCT.ttl", true);
OntologyParser parser = new OntologyParser(pi);
assertTrue(parser.parse());
}
The max depth metric is successfully calculated for both the SNOMEDCT and NCBITAXON TTL files, in 5 and 8 seconds respectively:
[main] DEBUG o.s.n.owlapi.wrapper.metrics.Graph - depth for owl:Thing is 30
[main] INFO o.s.n.o.w.metrics.OntologyMetrics - Finished metrics calculation for SNOMEDCT.ttl in 5047 milliseconds
[main] INFO o.s.n.o.w.metrics.OntologyMetrics - Generated metrics CSV file for SNOMEDCT.ttl
[main] DEBUG o.s.n.owlapi.wrapper.metrics.Graph - depth for owl:Thing is 37
[main] INFO o.s.n.o.w.metrics.OntologyMetrics - Finished metrics calculation for NCBITAXON.ttl in 7583 milliseconds
[main] INFO o.s.n.o.w.metrics.OntologyMetrics - Generated metrics CSV file for NCBITAXON.ttl
It should be relatively straightforward to modify the REST API to first check for the max depth in metrics.csv files. We're already doing this for classes, properties, etc.:
max depth calculated by owlapi_wrapper is off by 1 compared to the max depth calculated by ruby/sparql. | Ontology | Ruby | owlapi_wrapper |
---|---|---|---|
STY | 7 | 8 | |
SNOMEDCT | 29 | 30 | |
NCBITAXON | 36 | 37 |
This needs to be looked into
Max depth calculated by the owlapi_wrapper
starts from owl:Thing
, which serves as the root class for all other classes in the ontology. It's making this calculation during the initial step of our ontology ingestion process where the ontology is loaded into memory by the OWL API, regardless of the format. The STY ontology is sufficiently small that I was able to verify this manually. I suppose you could debate which methodology is correct.
Ontology metrics calculation fail for large UMLS ontologies such as SNOMEDCT and NCBITAXON with AllegroGraph 7.3.1 backend (with patches)