google / schemarama

Schemarama is a project exploring standards-based validation for structured data, especially Schema.org.
Apache License 2.0
124 stars 22 forks source link

how to deal with Wikidata timeout? #27

Closed VladimirAlexiev closed 2 years ago

VladimirAlexiev commented 3 years ago

Wikidata has a brutalistic timeout of 1 min: it may even cut a response in the middle, making it invalid.

I think that many of the queries in https://github.com/google/schemarama/tree/main/kgx/wikidata/basic will hit that timeout.

How to deal with it? We have some code that first gets IDs, then batches them up into reasonable pieces to fetch the extra data...

danbri commented 2 years ago

Sorry for delayed response.

The original idea with those files was to use something like https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits/

there are also big discussions around Wikidata about migrating beyond blazegraph somehow.

In fact one of the motivations to collaborate around focussed subsets was just these scaling issues

danbri commented 2 years ago

Closing for the reason just mentioned in your other issue. Core Schemarama isn't yet engaging with Wikidata (but we'll get there!). You might be interested in https://www.wikidata.org/wiki/Wikidata:Query_Service_scaling_update_Aug_2021 and nearby if you hadn't already seen that effort.