Open namedgraph opened 3 years ago
I disagree that it is under specified. This is HTTP behaviour. A server is free to reject any request as it chooses. This is normal HTTP.
Any resulting concurrency issues will be a matter for each implementation to consider according to its own architecture.
Why must each detail of the implementation behaviour be specified?
Implementations have choices. Implementations are not identical clones. They exist to serve a particular environment - lightweight, fast, small footprint, enterprise features, ...
If you want portable client, write to the spec and no more. Handle 409.
"backward compatibility" - choosing one set of behaviour breaks every system that does anything else.
Well that was what we were doing - writing to the spec. But it does not look adequate for a multi-user management system.
We have 2 types of updates: graph-sized edits that are small and occasional, and data imports that are large and long-running but only appending to the dataset. Interleaving them turned out to be a problem (a different one from this probably), but the protocol differences between Fuseki and Dydra led us to having to write multiple GSP client versions in order to support queuing. And then we would probably need to write one for Stardog, GraphDB and any other store we'd like to support... Which goes against the spirit of RDF protocols IMO.
Dydra is responding with 400 Bad Request
on a failed update which should probably be 409 Conflict
then? @lisp
I was wondering if GSP POST
(appending) can be a special case? I mean it cannot change existing data, so what would be the problem of letting them run concurrently? Phantom reads?
I was wondering if GSP POST (appending) can be a special case?
Not if mixed with general changes.
nothing guarantees that an implementation's storage model maintains the isolation.
Honestly I don't think you're going to get a lot of support from RDF database vendors for this. It's not unreasonable to specify various transaction isolation levels as a part of a future SPARQL Update spec so the client can expect some predictable behavior from a database system which implements a particular level (like presence/absence or particular data anomalies as a result of concurrent updates) [1]. But forcing implementations to maintain a particular level is probably a non-starter (I agree with @lisp on that).
[1] IIRC SQL'92 does that but it's not straightforward (esp. without delving into implementation details like locks) and it's been rightly criticized, see https://www.cs.umb.edu/~poneil/iso.pdf
Even with queues we end up with the situation that they can be full etc.
Suggesting a standard error code used to reject otherwise valid updates for any kind of resource exhaustion might be worthwhile. But looking at the common list I can't think of any that would not cause other issues e.g. 503 with a Retry-After comes closest but might cause issues with proxy/loadbalancer setups.
HTTP status codes are about the HTTP protocol and don't leave much room for communicating higher level errors. "409 Conflict" is sort of in the space for a 4xx.
Neither 4xx nor 5xx is perfect. The server is operating normally so 5xx is a bit odd, yet the request is correct so 4xx is odd.
RFC7231 6.5.8. 409 Conflict suggests a respond body:
The 409 (Conflict) status code indicates that the request could not be completed due to a conflict with the current state of the target resource. This code is used in situations where the user might be able to resolve the conflict and resubmit the request. The server SHOULD generate a payload that includes enough information for a user to recognize the source of the conflict.
It would be a new step for us but we could specify a JSON-LD response to carry more specific error codes.
the response should allow content negotiation.
XSLT defines error code for each condition: https://www.w3.org/TR/xslt-30/#error-summary
Application level error would be good; HTTP status codes are about HTTP-level matters. But.
No content negotiation on error responses (that I know of). As one UC is HTML for people, and one is machine processable content for another, this is tricky.
We have to agree the codes in detail or make them open ended. The XSLT example is things that can go wrong in XSLT processing not environment factors.
Exact definitions are hard to agree on and are grounded in today's technology (SQL serialization levels example).
Error codes are not the only route. The example at the start of this issues could go in the service description to describe policies in-force. That makes it a 409-ish.
No content negotiation on error responses (that I know of).
application/problem+json
(RFC 7807)?
Error responses that are themselves (or at least include) HTTP/S links inherently offer content negotiation, and would seem to be the best path forward, both for ConNeg and because they're open-ended -- new error conditions just need a new link, with appropriate machine- and human-focused content.
That said, challenges remain.
For instance, Virtuoso currently offers the "Anytime Query" feature, which provides some results for a query when full processing goes overtime, but when the results are being delivered as Turtle, for instance, there is no inline way to include an alert that this feature was triggered. The only way such alert can then be provided is through the HTTP response headers, which are primarily machine-directed, but if human-friendlier tools are developed that watch for these headers, they can present both a human-friendly error message and the partial result set.
Similar tactics could be used for handling concurrency and other conditions that warrant error transmission but do not fully block a transaction — but this would require SPARQL-specific tools, not "simple" web browsers, unless some browser plugin/extension can handle the error-bearing response headers and/or browser vendors buy into the practice.
@namedgraph you might also include Amazon Neptune under previous work. My observation is that Neptune can handle multiple concurrent writes (upper limit is number of worker threads on the writer instance). So long as the writes do not conflict with each other, then you can get decent concurrency. More details on that: https://docs.aws.amazon.com/neptune/latest/userguide/transactions.html
Why?
Concurrent writes are currently underspecified.
The only relevant passage in the spec seems to be 2.2 SPARQL 1.1 Update Services:
Previous work
Triplestores exhibit different behavior in the HTTP protocol which makes writing general clients difficult:
I mostly looked at the Graph Store Protocol, but I think the same applies to SPARQL Update. Please add the behavior of other stores.
Proposed solution
Not sure, but here are some ideas:
Considerations for backward compatibility
None that I can think of.