Open Gerbert-Kaandorp opened 2 months ago
Hi @Gerbert-Kaandorp, we are just executing the provided update query using:
parsed_update = prepareUpdate(update_query, initNs=graph_ns)
self.graph.update(parsed_update, "sparql")
So I guess this is just RDFLib not being really fast to insert data through update queries
And in general I don't think using INSERT DATA
is a fast way to load a lot of data for any triplestore (usually they provide another call specifically to bulk load turtle/xml files, which we could also do relatively easily here by adding a call that takes a RDF file, and parse it into the graph used by the endpoint). Using INSERT DATA
is more aimed at making small changes on the fly from an application (adding few dozen/hundred of triples)
If you have control over the server where you deploy the endpoint, then the recommended way is just to parse the file you want to load with RDFLib, then use this graph when instantiating the SparqlEndpoint
Hi Vincent!
Me again :), thanks a lot for adding the types in the last release! It is working out great in my dev stack and I don't need a converter any more! 🎉🎉
So, I wanted to test the performance of this setup a bit. I downloaded this Pokemon dataset
https://triplydb.com/academy/pokemon
It is about 4.893 MB / ~29000 triplets And I am using the following function to insert them over http using the endpoint.
Turns out, this is extremely slow. :(
And I am not sure if I am even using the api the right way Do you know what I am doing wrong? Or is this performance normal for using rdflib?
Thanks for reading. Gerbert