Open WolfgangFahl opened 20 hours ago
Note that we (OpenLink Software [1], [2]) have also loaded Wikidata into a live Virtuoso instance, available at https://wikidata.demo.openlinksw.com/sparql.
I'm not sure whether I'm the "Ted" referenced in the last paragraph; if so, regrettably, I've forgotten the specifics of that conversation. Could you provide more detail about the "question" being asked by this issue, especially to benefit others who may have more to contribute to the "answer" than I?
https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours has the info as well as https://www.wikidata.org/wiki/Wikidata:Scholia/Events/Hackathon_October_2024
We are well aware of the virtuoso endpoint it is already configured in the default https://github.com/WolfgangFahl/snapquery/blob/main/snapquery/samples/endpoints.yaml file.
The question here is how do we get a virtuoso endpoint that is as up-to-date as possible quickly. We intent to "rotate" images based on dumps as long as the streaming updates are not possible. So currently that would be roughly weekly. E.g. https://github.com/ad-freiburg/qlever-control/discussions/82
is an example. This is just the initial issued to start the communication. Depending on how Virtuoso is going to be involved we might need multiple tickets for the different aspects. I suggest to stick with the import performanc issue in this ticket for the time being and wait for Tim's comment.
@TallTed
Tim Holzheim has successfully imported Wikidata into a virtuoso instance see https://cr.bitplan.com/index.php/Wikidata_import_2024-10-28_Virtuoso and https://wiki.bitplan.com/index.php/Wikidata_import_2024-10-28_Virtuoso
for the documentation. The endpoint is available at https://virtuoso.wikidata.dbis.rwth-aachen.de/sparql/ and we would love to integrate this an other virtuoso endpoints into our snapquery https://github.com/WolfgangFahl/snapquery infrastructure.
Ted suggested that i should open a ticket to get the dicussion going about how virtuoso endpoints could be made part of the snapquery wikidata mirror infrastructure. The idea is to use named parameterized queries that hide the details of the endpoints so that it does not matter wether you use blazegraph, qlever, jena, virtuoso, stardog, ... you name it. Queries should just work as specified and be monitored for non functional aspects proactively.