ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

Can't reprocess NCIT in production without turning off UI traffic #262

Closed jvendetti closed 7 months ago

jvendetti commented 1 year ago

I recently tried (on 2022-11-17) to reprocess NCIT in production as part of troubleshooting why a particular class wasn't showing up in the class tree. Made two attempts to manually reprocess using the ncbo_ontology_process script, both of which resulted in bringing down the entire site (4store gets overloaded - possibly during submission graph deletion). Full stack trace from the parsing.log file:

I, [2022-11-17T12:33:55.577005 #18059]  INFO -- : ["OWLAPI Java command: parsing finished successfully."]
I, [2022-11-17T12:33:55.578512 #18059]  INFO -- : ["Output size 689452236 in `/srv/ncbo/repository/NCIT/123/owlapi.xrdf`"]
E, [2022-11-17T12:38:04.968288 #18059] ERROR -- : ["Net::HTTP::Persistent::Error: too many connection resets (due to end of file reached - EOFError) after 5 requests on 58060, last used 412.646970642 seconds ago
/usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/protocol.rb:225:in `rbuf_fill'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/protocol.rb:191:in `readuntil'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/protocol.rb:201:in `readline'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http/response.rb:42:in `read_status_line'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http/response.rb:31:in `read_new'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1528:in `block in transport_request'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1519:in `catch'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1519:in `transport_request'
    /usr/local/rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1492:in `request'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:999:in `request'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/sparql-client-fb4a89b420f8/lib/sparql/client.rb:744:in `request'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/sparql-client-fb4a89b420f8/lib/sparql/client.rb:395:in `response'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/sparql-client-fb4a89b420f8/lib/sparql/client.rb:368:in `update'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/goo-fd7d45cb862c/lib/goo/sparql/client.rb:80:in `delete_data_graph'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/goo-fd7d45cb862c/lib/goo/sparql/client.rb:162:in `delete_graph'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-8196bf34b45c/lib/ontologies_linked_data/models/ontology_submission.rb:1535:in `delete_and_append'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-8196bf34b45c/lib/ontologies_linked_data/models/ontology_submission.rb:475:in `generate_rdf'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-8196bf34b45c/lib/ontologies_linked_data/models/ontology_submission.rb:973:in `process_submission'
    /srv/ncbo/ncbo_cron/lib/ncbo_cron/ontology_submission_parser.rb:177:in `process_submission'
    bin/ncbo_ontology_process:98:in `block in <main>'
    bin/ncbo_ontology_process:81:in `each'
    bin/ncbo_ontology_process:81:in `<main>'"]

I asked for Alex's help and he turned off all traffic coming from the UI (Rails application) and manually ran each of the processing steps individually, e.g.:

ncbo_ontology_process -o NCIT -t process_rdf

ncbo_ontology_process -o NCIT -t index_search

I imagine we won't troubleshoot this further on 4store since we're relatively close to an AllegroGraph deployment. The AG developers maintain that graph deletion is fast in their triplestore. We should confirm we can reprocess in production after going to AG.

alexskr commented 7 months ago

BioPortal was migrated to AllegroGraph and NCIT can be processed with production traffic