ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

repeated requests for children causing 404 #162

Open graybeal opened 4 years ago

graybeal commented 4 years ago

This may be the same as another issue we've documented recently, but I'm capturing it separately in case it isn't.

Rakesh reported on 4/7/2020 that for the following URLs:

* http://data.bioontology.org/ontologies/DOID/classes/http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDOID_0110718
* http://data.bioontology.org/ontologies/DOID/classes/http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDOID_0110718/children

he gets the following error:

{
-"errors": [
"Resource 'http://purl.obolibrary.org/obo/DOID_0110718' not found in ontology DOID submission 598"
],
"status": 404
}

He has a script (attached) that he runs to walk DOID and NCIT monthly to build his model of the ontologies:

1) Start at the root node using:  http://data.bioontology.org/ontologies/<Ontology>/classes/roots
2) Recurse through children to get label, code, synonym using (?include=synonym), and for some ontologies like NCIT get additional fields from (?include=unmapped)

When he asks for the child links, he waits for the response and then issues another request. If , he waits 60 seconds and tries again, up to 30 cycles in total; but if no error he immediately makes the next request. (He had added the 30 retries because 5 retries didn't always fix it, but with 30 sooner or later in the 30 tries it works.)

His script has run once a month for a long time, and generally took about a day; now it takes on the order of 5 days, and often isn't completing due to the 404 errors.

In April this happened fairly often between 4/3 and 4/6. He also had some issues with it in March (lost track of the date).

This feels a lot like the issues we were troubleshooting about 5 weeks ago, when they were happening to lots of ontologies.

ncit_get_descendants_drug_subontology_get_addl_fields_v3.py.txt

mdorf commented 4 years ago

As pointed out in an earlier troubleshooting of the /latest_submission endpoint, our attempt at addressing the intermittent empty results from 4store may be compounding the issue by overwhelming the server with repeated attempts at getting data. The code subject to this comment is located here:

https://github.com/ncbo/ontologies_linked_data/blob/master/lib/ontologies_linked_data/models/ontology.rb#L144

We should consider removing multiple 4store attempts in this specific case, which affects the majority of our live API requests. It is still applicable for data processing jobs such as indexing or Annotator data caching.