w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
123 stars 19 forks source link

Reveal effects of update operations (DELETE, INSERT) #63

Open mchlrch opened 5 years ago

mchlrch commented 5 years ago

DELETE/INSERT operations currently do not reveal any information about if or how data has been modified by the query. For example updating a non-existing resource has no effect and happens silently. This makes it hard to tell if an update had any effect at all.

Why?

To have confirmation of expected effect inside application-logic, upon running a query. To be able to detect and signal errors.

Previous work

None that I am aware of

Proposed solution

Query response could reveal how many triples were added and/or removed during DELETE/INSERT operations.

1 triple added, 1 triple removed

Considerations for backward compatibility

None that I am aware of

rubensworks commented 5 years ago

Query response could reveal how many triples were added and/or removed during DELETE/INSERT operations.

Or to go a step further, responses could (optionally) be an RDF graph, which could potentially even be used in a nested SELECT query. (following the extension from #33)

e.g.

SELECT *
FROM {
  DELETE {
    ?person foaf:givenName ?name.
  }
  WHERE {
    ?person foaf:givenName ?name.
    ?person foaf:knows ?knows.
  }
}
WHERE {
  ?person foaf:givenName ?name.
}
tayloj commented 5 years ago

This is an interesting idea. I wonder what sort of results would be expected in the the case of non-static graphs, e.g., graphs backed by inferencing query engines. Also, I wonder whether there are any engines that compute triples based on deltas from applied queries. E.g., rather than actually deleting triples from some fixed store, the content might be computed based on some base plus the operations that have occurred.

But definitely, when it's possible, this could be useful to have.

TallTed commented 5 years ago

Look to other protocols, e.g., ODBC has SQL_SUCCESS_WITH_INFO which may result from conditions that user might consider failure as well as success. It differs from a simple SQL_SUCCESS (which is 100% success, no additional info) or a simple SQL_ERROR (which is 100% failure, and has pre-defined methods to get more detailed error messages). The soft-success of SQL_SUCCESS_WITH_INFO also has pre-defined methods to get more detailed messages about the success/failure -- e.g., "you said update, but there was no existing record, so it was taken as an insert" or "you said update, but there was no existing record, so your update had no effect" or "this clever engine noticed that many of your URIs were based on exmple.com which never occurs in this data; perhaps you meant example.com", etc.

Neither silent successes nor silent failures make me happy, unless I've specifically asked for that silence. In most cases, I want clear confirmation of success or failure, usually with relevant details (such as how many triples were DELETEd or INSERTed).

cygri commented 5 years ago

Edit: Nevermind this comment, I didn't think this through properly. What is needed is reporting on the actually affected triples. What I proposed here would only be able to report on the solutions of the query pattern.


I like the general direction of @rubensworks's proposal above: extend the update language so that user-defined query-like results can be returned from an update operation. A version of that could look like this:

REPORT (count(*) AS ?deletedTriples)
DELETE {
    ?person foaf:givenName ?name.
}
WHERE {
    ?person foaf:givenName ?name.
    ?person foaf:knows ?knows.
}

The REPORT line would work like the SELECT line of a select query. This could be used to return a list of the deleted IRIs, or the deleted people's names, or just their count.

michielbdejong commented 4 years ago

Solid does something similar in https://github.com/solid/solid-spec/pull/193/files

namedgraph commented 4 years ago

@cygri why not add this to the Protocol instead of changing the language?

timbl commented 4 years ago

Because modular software systems work by delivering a payload like a query to an end point, not by delivering a query plus a bunch of flags to set headers in particular ways. If the protocol spec forces you to do that, then you end up breaking the architecture of the software one way or another. I feel.

ericprud commented 4 years ago

@timbl , I agree in the general case, but I feel like the reporting pragma is more strongly bound to the invocation. As a test, consider how often you'll want to reuse an update verbatim in different pieces of code, vs. how often you'll want to reuse it and its reporting pragma. I feel like adding the reporting pragma to the query is like adding the result format to a query:

SELECT ?p
WHERE {?s ?p ?o . ?o ?p ?s)
RESULT_FORMAT: application/json

Of course, it's easier to edit an embedded query string in some code than it is to add it to the surrounding protocol, but I don't think that should drive the architecture.

afs commented 4 years ago

The result format does not change whether there is an abstract result. It is just the appearance of the result being chosen. It is different for building modular systems, when the flags change whether there is a result, or what the information-content is. Presentation vs semantics.

ericprud commented 4 years ago

The result format does not change whether there is an abstract result. It is just the appearance of the result being chosen.

I think your argument is that for SPARQL, the result format transforms a result already implied by the semantics of the query while e.g.

REPORT (count(*) AS ?deletedTriples)

is changing (creating) the resulting report. (I know that cygri withdrew that proposal, but there aren't any others to use here.)

The SPARQL Protocol has these parameters:

paramtypesemantics
querySPARQL queryboolean, result set, graph
default-graph-uriIRIpick graph for BGPs outside of a `GRAPH`
named-graph-uriIRImake IRI available for `GRAPH` clauses
updateSPARQL updatechange contents of quad store
using-graph-uriIRIdefault and named graph IRIs
using-named-graph-uriIRI

The *`-graph-uri`** directives can be specified both in the query/update and in the protocol, so I guess the precedent is to do it both ways, with priority to the protocol. That would allow folks to re-use the same updates in different contexts where they wanted the overhead of accounting or not.

Updating the protocol would probably be the path of least resistance as people shove whatever they want into the protocol anyways. Protocol endpoints tend to ignore unknown query parameters while a parser will almost always barf on any SPARQL it can't parse.

michielbdejong commented 4 years ago

FWIW, I noticed that in Solid, on the one hand, the delete of a triple using PATCH does require read permissions on the triple being deleted, but on the other hand, the delete of a resource using DELETE does not require read permissions on the containment triple that links that resource to its container.

afs commented 4 years ago

@ericprud I don't think the *`-graph-uri`** are good precedence because I don't think they have shown much utility. The update one especially don't look like a good idea any longer (hindsight!).

The only possible exception is default-graph-uri but otherwise I haven't seen them used in the wild.

Any reported sightings?

jindrichmynarz commented 4 years ago

default-graph-uri is useful to decouple storage configuration from queries. When an RDF store doesn't support multiple RDF datasets, such as via separate repositories, it is convenient to separate datasets into named graphs. Such datasets are not originally structured as quads, so using named graphs for them can be considered storage configuration. Decoupling it from queries allows to use the same queries both in RDF stores that support multiple datasets and those that don't.

afs commented 4 years ago

I've forked the "-graph-uri" discussion onto #106.