openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/
Other
847 stars 214 forks source link

Slow post performance on `/sparql-graph-crud-auth` #1246

Open ebremer opened 4 months ago

ebremer commented 4 months ago

I'm getting slow post performance when using /sparql-graph-crud-auth (latest dev version of Virtuoso) to post small graphs (300-800 triples each). Each post take about 15 seconds or so. Is there anyway to get better performance? Note, bulk loading works great (system currently holds 9 billion triples). It seems like something is choking it a bit.

HughWilliams commented 4 months ago

Are you saying the ~/sparql-graph-crud-auth endpoint performance is slow compared a previous Virtuoso open source build you have been using and if so, what are the gitid's of both for comparison ?

Also, if you have a test case for recreating the problem that would be ideal ...

ebremer commented 3 months ago

No, I'm not claiming any regression. To date, I've loaded all of my RDF via isql bulk loading. Now, I need to delete small group of triples and add multiple small groups of triple programatically. This is actually the first time I'm using Virtuoso graph management end point. When I add batches of triples, there seems to be a minimal overhead in sending anything. So for the millions of small batches I needed to do, it would take too long. I buffered locally the smaller batches in a Jena memory model and pushed in batches of 100,000 triple which sped this up enough for it to work for me. Any way to improve speed for smaller batches?

HughWilliams commented 3 months ago

The Virtuoso restful HTTP ~/sparql-graph-crud-auth endpoint performance will never match or be close to the Virtuoso bulk loader which is optimised in the database engine for loading data and there are no specific configuration params for it. You could enable Virtuoso query logging to log the database activities when performing such operations, and see where the time is being spent.

Note that we also have a document on Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework, which you might want to review to see how this can be optimally done in Jena.

namedgraph commented 3 days ago

@HughWilliams 15 seconds for under a 1000 triples sounds ridiculous though. We want to iterate and update thousands of graphs. Haven't been able to test it yet due to /sparql-graph-crud-auth auth issues (see #1304), but if this indeed is the case then the Graph Store Protocol is unusable.

TallTed commented 3 days ago

@ebremer — "latest dev version of Virtuoso" doesn't communicate well, especially over time. Please always provide the full version string including the git_head value, as reported on the commandline by virtuoso -?, or as easily retrieved via SPARQL.

TallTed commented 3 days ago

@namedgraph — It's not clear to me whether the "15 seconds for under a 1000 triples" assessment remains accurate, presuming @ebremer followed the previously linked advice in Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework. We look forward to hearing and discussing your results, upon resolution of #1304.

ebremer commented 2 days ago

@HughWilliams Understood. I never imagined Virtuoso restful HTTP ~/sparql-graph-crud-auth endpoint performance would ever match or be close to the Virtuoso bulk loader. The bulk loader has faithfully loaded billions of triples into my triple store at a rapid rate. No complaints there!

I'll be revisiting the smaller loading via /sparql-graph-crud-auth soon as I have updates to perform. @TallTed I wasn't intending to be non-informative/unhelpful by leaving out the exact version I was using , I merely thought there was some basic tuning stuff I could look at first as I wasn't trying to insinuate any bugs at that time. But, I will include the string if it helps for reference currently or as a historical note.

@HughWilliams I'll take a look at the code you sent me before trying my update and report back what I find.

@namedgraph how many triples / named graphs are in your store? Mine is about 20 billion triples and another Virtuoso instance containing 11 billion triples of bibliographic data on a separate server.