owlcs / owlapi

OWL API main repository
813 stars 314 forks source link

[question] How to import SKOS Core Vocabulary without getting 429 response code #1114

Closed jvendetti closed 1 month ago

jvendetti commented 8 months ago

Several days ago we started having issues in the BioPortal project with ontologies that import the SKOS Core Vocabulary. If an ontology contains an import something like this:

<owl:imports rdf:resource="http://www.w3.org/2004/02/skos/core"/>

... and we attempt to load the ontology, the OWL API reports it as an unloadable import. The following code snippet to load this example ontology:

String path = "src/test/resources/BRO_v3.2.owl";
FileDocumentSource fileDocumentSource = new FileDocumentSource(new File(path), new RDFXMLDocumentFormat());
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(fileDocumentSource);

... results in this exception:

org.semanticweb.owlapi.model.UnloadableImportException: Could not load imported ontology: <http://www.w3.org/2004/02/skos/core> Cause: Server returned HTTP response code: 429 for URL: https://www.w3.org/2004/02/skos/core 

Since the reason for the unloadable import is a 429 from the server, I reached out to W3C support to ask why they're returning a "Too Many Requests" status code for SKOS Core. I got this response from them:

We recently adjusted our abuse prevention systems to address an excessive volume of requests from automated systems, adding up to tens of millions of requests per day.

Do you have an idea of how frequently these requests are being made by your application? Our current system limits Java/* user-agents to 10 requests per minute per URI.

If you are able to customize the user-agent header to identify your application on outgoing requests that should help bypass this generic restriction on Java user-agents. If it's making a large number of requests it would be good to add a cache as well.

It seems they are now rate limiting access to SKOS Core from all Java agents. I'm wondering if you have advice about the best way for us to address this issue? Should we consider hosting our own copy of SKOS Core and use a SimpleIRIMapper to load SKOS Core?

jvendetti commented 8 months ago

I should have clarified that we use version 4.5.18 of the OWL API. I see the same behavior in the latest version of Protege (5.6.3, which uses version 4.5.26). Opening the example ontology that I mentioned above in Protege also shows a failure to load SKOS Core:

   INFO  17:15:38  OWL API Version: 4.5.26.2023-07-17T20:34:13Z
   INFO  17:15:43  ------------------------------- Loading Ontology -------------------------------
   INFO  17:15:43  Loading ontology from file:/Users/vendetti/Development/GitHub/ncbo/ontologies_linked_data/test/data/ontology_files/repo/BROTEST-METRICS/33/BRO_v3.2.owl
   INFO  17:15:43  Adding folder to ontology catalog: /Users/vendetti/Development/GitHub/ncbo/ontologies_linked_data/test/data/ontology_files/repo/BROTEST-METRICS/33
   INFO  17:15:43  ---------------------------- Starting Catalog Update ---------------------------
   INFO  17:15:43  Update of group entry Folder Repository, directory=, recursive=true, Auto-Update=true, version=2 started at Thu Oct 26 17:15:43 PDT 2023.
   INFO  17:15:43   Examining: /Users/vendetti/Development/GitHub/ncbo/ontologies_linked_data/test/data/ontology_files/repo/BROTEST-METRICS/33
   INFO  17:15:43  Catalog Update Complete
   INFO  17:15:43  
   INFO  17:15:43  Imported ontology document http://www.w3.org/2004/02/skos/core was not resolved to any documents defined in the ontology catalog.
   INFO  17:15:43  Failed to load imported ontology at http://www.w3.org/2004/02/skos/core
ignazio1977 commented 8 months ago

Yeah, no recent changes to that part otf the code so any owlapi version from the past three or four years should behave just the same.

My guess is that some sort of caching would be the best approach, IRI mapper and a copy of the file would work, but might need some way of noticing updates?

A private mirror would also do, provided you have a way to keeping it up. I'm no expert in the area.

Third option would be to specify the app making the call, I'm not sure if we have any way of doing that at present, so that might require an update, with the time implications of getting it in all the apps you need it for.

I've not seen the problem reported anywhere else but I guess it's a matter of time before the reports multiply.

jvendetti commented 8 months ago

Thanks very much for your thoughts on this. Will probably go with a copy of the file and and IRI mapper for now.