Open ebremer opened 4 months ago
Are you saying the ~/sparql-graph-crud-auth
endpoint performance is slow compared a previous Virtuoso open source build you have been using and if so, what are the gitid's of both for comparison ?
Also, if you have a test case for recreating the problem that would be ideal ...
No, I'm not claiming any regression. To date, I've loaded all of my RDF via isql
bulk loading. Now, I need to delete small group of triples and add multiple small groups of triple programatically. This is actually the first time I'm using Virtuoso graph management end point. When I add batches of triples, there seems to be a minimal overhead in sending anything. So for the millions of small batches I needed to do, it would take too long. I buffered locally the smaller batches in a Jena memory model and pushed in batches of 100,000 triple which sped this up enough for it to work for me. Any way to improve speed for smaller batches?
The Virtuoso restful HTTP ~/sparql-graph-crud-auth
endpoint performance will never match or be close to the Virtuoso bulk loader which is optimised in the database engine for loading data and there are no specific configuration params for it. You could enable Virtuoso query logging to log the database activities when performing such operations, and see where the time is being spent.
Note that we also have a document on Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework, which you might want to review to see how this can be optimally done in Jena.
@HughWilliams 15 seconds for under a 1000 triples sounds ridiculous though. We want to iterate and update thousands of graphs. Haven't been able to test it yet due to /sparql-graph-crud-auth
auth issues (see #1304), but if this indeed is the case then the Graph Store Protocol is unusable.
@ebremer — "latest dev version of Virtuoso" doesn't communicate well, especially over time. Please always provide the full version string including the git_head value, as reported on the commandline by virtuoso -?
, or as easily retrieved via SPARQL.
@namedgraph — It's not clear to me whether the "15 seconds for under a 1000 triples" assessment remains accurate, presuming @ebremer followed the previously linked advice in Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework. We look forward to hearing and discussing your results, upon resolution of #1304.
@HughWilliams Understood. I never imagined Virtuoso restful HTTP ~/sparql-graph-crud-auth endpoint performance would ever match or be close to the Virtuoso bulk loader. The bulk loader has faithfully loaded billions of triples into my triple store at a rapid rate. No complaints there!
I'll be revisiting the smaller loading via /sparql-graph-crud-auth soon as I have updates to perform. @TallTed I wasn't intending to be non-informative/unhelpful by leaving out the exact version I was using , I merely thought there was some basic tuning stuff I could look at first as I wasn't trying to insinuate any bugs at that time. But, I will include the string if it helps for reference currently or as a historical note.
@HughWilliams I'll take a look at the code you sent me before trying my update and report back what I find.
@namedgraph how many triples / named graphs are in your store? Mine is about 20 billion triples and another Virtuoso instance containing 11 billion triples of bibliographic data on a separate server.
I'm getting slow post performance when using
/sparql-graph-crud-auth
(latest dev version of Virtuoso) to post small graphs (300-800 triples each). Each post take about 15 seconds or so. Is there anyway to get better performance? Note, bulk loading works great (system currently holds 9 billion triples). It seems like something is choking it a bit.