Closed eltonfss closed 1 year ago
There are various methods of ingesting data into Virtuoso as detailed in this RDF Insert Methods in Virtuoso document. Via REST with the Virtuoso Sponger Middleware, RDF datasets (NQuad and other such) can be ingested directly from the SPARQL endpoint, Virtuoso Crawler, or RDF Sink Folders (Virtuoso or ODS-Briefcase).
We also have sample code on how to Bulk Load RDF datasets (NQuad and other such) using the RDF4J and Jena frameworks you might also want to review.
@HughWilliams Thank you for the prompt response! I've looked into the links you shared and configured the Virtuoso Sponger Middleware in my local deployment.
Unfortunately, even after installing it I get an error message informing the I cannot upload data in N-Quads format:
I've also tried to make the import through the /sparql-graph-crud-auth
but I get an equivalent error:
Packages installed:
SPARQL Account:
I also considered trying the RDF Sink Folders (Virtuoso or ODS-Briefcase) but I noticed that they require the data is placed in a file inside the server, which is not ideal for my use case (I ultimately need to send the NQuads Payload in the HTTP request body). Is that interpretation correct?
Using RDF4J or Jena as a "frontend" for Virtuoso might also make sense, but since I'm trying to evaluate multiple triplestores (which are all deployed using docker) I haven't yet been able to invest time into that. If there was some containerized version of this it would be very helpful (I've came across this one https://github.com/asanchez75/docker-rdf4j-virtuoso/blob/master/Dockerfile but since it was made a long time ago it might not be very reliable). Are you aware of a better/faster way to deploy one of these combined solutions as a docker image?
We don't have a docker container image for the Virtuoso RDF4J HTTP Repository, thus if you wanted to use this it would have to be set manually for use or a docker container image created for the setup, if deployment via docker is required.
@HughWilliams could you add N-Quads support to the Graph Store Protocol? It would be non-standard as per SPARQL 1.1, but Jena supports it, and probably others.
E.g. POST /sparql-graph-crud/
(without any graph
param) would append quads to the dataset.
We committed patch https://github.com/openlink/virtuoso-opensource/commit/0d2c90ca90ea39666fa8dbf747c12498e1a434da to the develop/7 branch to add N-QUADS support to the Graph Store Protocol.
@pkleef so is it full CRUD support or only POST
?
I'm trying the following and getting 406 Unacceptable
curl -i --digest -u dba:dba http://localhost:9030/sparql-graph-crud-auth -H "Accept: application/n-quads"
It should work like this (and it does in Jena):
GET
returns the whole dataset as quadsPOST
appends quads to datasetPUT
replaces the whole dataset as quadsDELETE
removes the whole datasetThe same issue is described in https://github.com/w3c/sparql-dev/issues/56
@namedgraph —
@pkleef so is it full CRUD support or only
POST
?I'm trying the following and getting
406 Unacceptable
curl -i --digest -u dba:dba http://localhost:9030/sparql-graph-crud-auth -H "Accept: application/n-quads"
It should work like this (and it does in Jena):
GET
returns the whole dataset as quadsPOST
appends quads to datasetPUT
replaces the whole dataset as quadsDELETE
removes the whole datasetThe same issue is described in w3c/sparql-dev#56
Jena is a triple store while Virtuoso is a Quad Store. What you desire isn't as trivial as presented in a Quad Store that also includes fine-grained named graph scoped ACLs for security and data governance, etc.
What's been implemented for this non-standard extension is:
$ curl -i --digest -u dba:dba http://localhost:8890/sparql-graph-crud-auth \
-X POST -H 'Content-Type: application/n-quads' --data-binary @test.nq
A default POST
request can add triples to existing graphs specified in test.nq
.
If you want to clean all the graphs referenced in your .nq
file, you can use a PUT
command, which will incorporate a with-delete
operation similar to the existing bulk loader.
Feature restrictions
.nq
file cannot be split over 2 or more filesThat's what's on offer for now, due to the non-standard nature of these extensions, etc. If a special-need implementation is required, that can be pursued as potential customer-specific custom development rather than a bug fix.
Usage example with current implementation
curl --digest -u dba:*** -i -X POST --data-binary @nq1.nq -HContent-Type:application/n-quads http://localhost:8890/sparql-graph-crud-auth/
<g1> 5 triples
<g2> 5 triples
curl --digest -u dba:*** -i -X PUT --data-binary @nq2.nq -HContent-Type:application/n-quads http://localhost:8890/sparql-graph-crud-auth/
<g1> 4 triples
<g2> 3 triples
Verification:
sparql select ?g count(*) { graph ?g { ?s ?p ?o } filter (?g in (<g1>,<g2>))};
Jena is a triple store while Virtuoso is a Quad Store
@kidehen that is incorrect and you know it. Jena is a quad store as well.
@kidehen that is incorrect and you know it. Jena is a quad store as well.
Clearly I didn't know that, hence my inaccurate comment.
Following your response, I've now looked up Bing+ChatGPT for the latest description.
Jena TDB is both a triple store and a quad store. It can store and query RDF data as triples or quads. A triple is a statement that consists of a subject, a predicate, and an object. A quad is a statement that also includes a graph name, which can be used to group triples into different named graphs. [Jena TDB supports the full range of Jena APIs for working with triples and quads](about:blank#)². You can use the Dataset API to access and manipulate named graphs in Jena TDB⁴. You can also use SPARQL queries to select, construct, or update triples or quads from different graphs⁵. Jena TDB is a native high performance triple store that does not require any extra tool other than Jena Framework².
Source: Conversation with Bing, 9/6/2023 (1) Apache Jena - TDB. https://jena.apache.org/documentation/tdb/. (2) Apache Jena - Home. https://jena.apache.org/. (3) rdf - Persisting data in Jena TDB triple store - Stack Overflow. https://stackoverflow.com/questions/30682246/persisting-data-in-jena-tdb-triple-store. (4) Apache Jena - TDB Architecture. https://jena.apache.org/documentation/tdb/architecture.html. (5) GitHub - srdc/triplestore: Unified Triple Store Interface working with .... https://github.com/srdc/triplestore.
A clearer response: Virtuoso implements Quad Storage via its core DBMS engine. It provides ACID for CRUD operations, and uses named-graph-scoped ACLs for fine-grained attribute-based access control (ABAC).
The fundamentals above impact its behavior, with regard to Quad Management and what is acceptable via its SPARQL Graph Protocol implementation.
Is there any way to ingest a payload in NQuad format via REST in Virtuoso?
According to https://etl.linkedpipes.com/tutorials/how-to/load_data_to_virtuoso:
Nonetheless, I need to be able to send the Quads as a payload in REST request. As I haven't yet found an alternative in the official documentation and was wondering if this feature is indeed not available in Virtuoso. If not, it might be of interest to include this in the development roadmap, since that same feature is available in other triplestores, such as Jena Fuseki, AllegroGraph, BlazeGraph, GraphDB and RDFox.