ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

EMIF-AD failed to process with "Error Rdf Labels" status #200

Open jvendetti opened 3 years ago

jvendetti commented 3 years ago

End user reports that upload of version 0.0.2 of EMIF-AD failed during the label generation step. To follow is the first part of the stack trace from the production log file:

E, [2020-12-14T01:13:52.924004 #25906] ERROR -- : ["Exception: Rapper cannot parse turtle file at /tmp/data_triple_store20201214-25906-gg7xnb: rapper: Parsing URI file:///tmp/data_triple_store20201214-25906-gg7xnb with parser turtle
rapper: Serializing with serializer ntriples
rapper: Error - URI file:///tmp/data_triple_store20201214-25906-gg7xnb:5 - syntax error at '<'
rapper: Parsing returned 4 triples

/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/goo-ac4b87e33337/lib/goo/sparql/client.rb:61:in `bnodes_filter_file'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/goo-ac4b87e33337/lib/goo/sparql/client.rb:83:in `append_triples_no_bnodes'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/goo-ac4b87e33337/lib/goo/sparql/client.rb:127:in `append_data_triples'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/goo-ac4b87e33337/lib/goo/sparql/client.rb:139:in `append_triples'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:632:in `generate_missing_labels_pre'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:550:in `call'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:550:in `block (2 levels) in loop_classes'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:504:in `block in process_callbacks'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:500:in `delete_if'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:500:in `process_callbacks'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:549:in `block in loop_classes'
    /usr/local/rbenv/versions/2.6.6/lib/ruby/2.6.0/benchmark.rb:308:in `realtime'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:531:in `loop_classes'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-0a7001683cb0/lib/ontologies_linked_data/models/ontology_submission.rb:1002:in `process_submission'
    /srv/ncbo/ncbo_cron/lib/ncbo_cron/ontology_submission_parser.rb:177:in `process_submission'

I checked the ontology source file against an RDF validator and saw no issues. The source file also opens in Protege without errors. The rapper command line utility doesn't report any errors with the triples:

[ncbo-deployer@ncbo-prd-app-31 2]$ rapper -i rdfxml -o ntriples owlapi.xrdf > data.triples
rapper: Parsing URI file:///srv/ncbo/share/env/production/repository/EMIF-AD/2/owlapi.xrdf with parser rdfxml
rapper: Serializing with serializer ntriples
rapper: Parsing returned 4340 triples
[ncbo-deployer@ncbo-prd-app-31 2]$ rapper -i ntriples -c data.triples
rapper: Parsing URI file:///srv/ncbo/share/env/production/repository/EMIF-AD/2/data.triples with parser ntriples
rapper: Parsing returned 4340 triples
jvendetti commented 3 years ago

I uploaded this ontology to our staging environment (https://stage.bioontology.org/ontologies/EMIF-AD?p=classes), and the system processed it without any errors. Appears that this issue is only occurring in production.

jvendetti commented 3 years ago

The UMLS import process for UMLS 2020AB was executed on January 7th. A production snapshot of all ontology data is moved to the staging environment and reprocessed against new UMLS data. The reprocessed ontology data is then copied back to production, overwriting what was there prior.

This ontology has always processed successfully in our staging environment (as I noted above). So, it's fixed now in production as a side effect of getting copied over after completion of the UMLS import process.

It's unclear if the same failure would happen again in production if the user uploads a new version of this ontology.

inigobermejo commented 3 years ago

Dear @jvendetti , thanks for your work on this! I have one question about this: many classes in the ontology have been assigned class mappings, but there are no mappings to be seen in the Mappings tab of the ontology. Do I have to do something in the ontology for the mappings to show up in the mappings tab?

Thanks!

jvendetti commented 3 years ago

@inigobermejo - I have an idea about what may be causing the EMIF-AD Mappings tab to report no mappings. In order to investigate, I need to regenerate a portion of the mapping data in our system, which is currently an expensive operation. The regeneration is scheduled for this weekend when the load on our system is reduced. Apologies for the inconvenience - I will update this issue hopefully on Monday next week.

jvendetti commented 3 years ago

The EMIF-AD ontology has mappings, but the mapping count is currently returned as zero by the mappings statistics endpoint (https://data.bioontology.org/mappings/statistics/ontologies):

Screen Shot 2021-01-13 at 11 19 14 AM

I believe the system ended up in this state due to the following sequence of events:

1). Initial uploads of EMIF-AD on Dec. 10th and 14th failed to process.

2). The mapping counts job that runs every Saturday created a record for this ontology and set the count to zero since there were no successfully processed submissions. First occurrence in the log is from Saturday, Dec. 12th:

I, [2020-12-12T01:27:32.316691 #26064] INFO -- : 422/1029 Time for EMIF-AD took 0.511094055 sec. records 0

3). The ontology was successfully processed in our staging environment and copied over to production on Jan. 7th as part of the UMLS import process.

4). The mapping counts script hasn't executed since January 2nd. All cron jobs were disabled for a period of time in production during the UMLS 2020AB import. The jobs were re-enabled on Monday, January 11th and the mapping counts script will next execute this weekend on January 16th.

Theoretically, the mapping count value should be updated/corrected by sometime Jan. 16th or 17th, assuming the mapping counts script successfully executes.