ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

Unable to retrieve classes for MONDO #125

Closed jvendetti closed 6 months ago

jvendetti commented 5 years ago

The REST API shows a class count of 120,941, but returns an empty collection (http://data.bioontology.org/ontologies/MONDO/classes). Attempting to navigate to the Classes tab in the UI shows a "Problem retrieving classes" error.

Production parsing log file for submission 39 shows:

E, [2019-05-02T11:23:11.047941 #17925] ERROR -- : ["too many connection resets (due to end of file reached - EOFError) after 285252 requests on 47011693037600, last used 6924.860438528 seconds ago"]
E, [2019-05-02T11:23:11.048827 #17925] ERROR -- : [#<Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 285252 requests on 47011693037600, last used 6924.860438528 seconds ago>]
E, [2019-05-02T11:23:11.050002 #17925] ERROR -- : ["NoMethodError: undefined method `id=' for nil:NilClass
/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.5.0/bundler/gems/ontologies_linked_data-548e7b1e4fb8/lib/ontologies_linked_data/models/ontology_submission.rb:1126:in `process_metrics'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.5.0/bundler/gems/ontologies_linked_data-548e7b1e4fb8/lib/ontologies_linked_data/models/ontology_submission.rb:1058:in `process_submission'
    /srv/ncbo/ncbo_cron/lib/ncbo_cron/ontology_submission_parser.rb:177:in `process_submission'
    bin/ncbo_ontology_process:98:in `block in <main>'
    bin/ncbo_ontology_process:81:in `each'
    bin/ncbo_ontology_process:81:in `<main>'"]
jvendetti commented 5 years ago

Notes on further investigation:

The "too many connection resets" error occurs after the start of metrics calculation:

I, [2019-06-04T14:41:31.433349 #6962]  INFO -- : ["metrics_for_submission start"]

I ran the ncbo_ontology_metrics script in isolation on production, which finished quickly and without errors. However, the classes still aren't retrievable. I then attempted a total reprocessing of the ontology, but ended up with the same error as above. As soon as metrics calculation begins, the triplestore appears to be flooded with queries and hangs. 4store was unable to recover and had to be manually restarted.

I also ran a couple of experiments in our staging environment. Mondo is an ontology from the OBO Foundry and is available in a number of different distribution formats. There's an OBO format equivalent, so I uploaded that and noted that there are no such issues:

http://stage.bioontology.org/ontologies/MONDOOBO?p=classes

Interestingly, I believe that the Ontology Lookup Service is also serving the OBO version, based on the appearance of the class tree.

Our staging environment was also able to handle the "minimal" OWL version of Mondo, which the foundry describes as being complete in terms of logical axioms, but has no textual definitions and subsets:

http://stage.bioontology.org/ontologies/MONDOMIN?p=classes

The REST endpoint returns the classes quickly since the results are paged. The UI will sometimes display an error message about taking too long to load the classes and only shows partial results. I opened the OWL version of the ontology in Protege and can see that there are a huge number of root classes with little no hierarchy - sort of like a bag of terms.

jvendetti commented 5 years ago

Uploaded the OBO version of this ontology to production so that end users have access to the ontology data until such time that we're able to diagnose/fix this issue with the OWL format. I had to do a "file upload" instead of entering a pull URL due to this issue. This means that new versions of MONDO aren't getting pulled automatically.

matentzn commented 4 years ago

Maybe it has nothing to do with this, but just FYI: https://github.com/EBISPOT/OLS/issues/312 https://github.com/monarch-initiative/mondo/issues/1451

jvendetti commented 6 months ago

There's been some udpates to the BioPortal software and the system is now able to handle / load the main OWL edition of Mondo (mondo.owl). Namely, @stdotjohn rewrote the code for calculating the ontology max depth metric in version 1.4.0 of the owlapi_wrapper project. We've also transitioned from the defunct 4store triplestore to AllegroGraph.

I created a test entry today for the OWL version of MONDO in our staging environment and it loads without errors:

https://stage.bioontology.org/ontologies/MONDOOWL