Multi-request transaction support in the SPARQL protocol

afs commented 5 years ago

The SPARQL 1.1 protocol only provides for atomic single oprartions (in keeping with HTTP).

Sometimes, the client application wishes to mix updates and queries in a single transaction.

For example, complex updates may be best done with multiple SPARQL Updates, being created by navigating the changing data.

SPARQL Update only provide for multiple operations in the same HTTP request.

Previous work

RDF4J Transactions

Drafted SPARQL 1.1 Transaction Protocol (not implemented).

Considerations for backward compatibility

None.

Client sends POST with appropriate arguments to /transactions to start a transaction.
Server redirects client to /transactions/1234/sparql, where 1234 is the transaction ID. This URL is a SPARQL protocol endpoint.
Client sends SPARQL protocol requests to /transactions/1234/sparql. All interactions with this endpoint are in scope of the transaction.
Client sends DELETE to /transactions/1234/sparql to roll back, or POST with an appropriate argument to the same URL to commit.

The main point in this sketch is that starting a transaction creates a new temporary SPARQL endpoint under an endpoint URL specific to that transaction. Interactions with that endpoint URL are in scope of the transaction.

Compared to the two protocols described in the previous work, this approach has the advantage that the SPARQL Protocol interactions are completely normal SPARQL 1.1 Protocol, with no need for special headers, special request parameters, or any other additions. So it becomes possible for a client to start a transaction, then pass the transaction-specific endpoint URL to a SPARQL 1.1 Protocol client, then have that client submit queries and operatins, and finally the original client can commit or rollback. The SPARQL 1.1 Protocol client needs not be aware that it is doing transactional requests.

lisp commented 5 years ago

this approach has the advantage that the SPARQL Protocol interactions are completely normal SPARQL 1.1 Protocol,...

except, they are not atomic.

cygri commented 5 years ago

except, they are not atomic.

except, they are.

lisp commented 5 years ago

please, explain.

dbooth-boston commented 5 years ago

except, they are not atomic.

except, they are.

@lisp and @cygri , can you please provide evidence for your claims, so that others can better follow?

lisp commented 5 years ago

i have no evidence. i just read the description of the proposed protocol variation and do not understand how the server can handle update requests to /transactions/1234/sparql as atomic when the delete/commit for the 1234 transaction is a consequence of some eventual subsequent request.

cygri commented 5 years ago

The proposal above is a reaction to the two protocols presented in the issue description under Previous Work. It improves on both of these proposals in that it allows SPARQL 1.1 clients to engage in multi-request transactions, by sticking the transaction ID into the endpoint URL. Apart from that, the proposal does not differ conceptually from these two protocols, and AFAICT there is nothing in the difference that would affect atomicity.

afs commented 5 years ago

When the transaction ID is a large number (UUID), the only overlap comes because the client passed it around or does multithreading.

One implementation would be an MRSW lock inside the transaction request execution path to provide guarantees within a transaction but it is an implementation issue.

Any HTTP request-response happens at an idealised "point in time" - they are atomic. The server has to make it happen. Nothing special about the SPARQL protocol here.

afs commented 5 years ago

An alternative is an addition query string parameter and use the usual endpoint for query/update etc. Given that security may apply on some endpoints based on URL (e.g. different URLs for query and update), using the same endpoints for operations with transactions and single operations would be helpful. It ai also helpful to client libraries as the sequence "begin"-any existing code-"commit" works. /transactions/<id> is for transaction control.

lisp commented 5 years ago

i am afraid you have lost me. would it be possible to formulate this as an interaction diagram which includes the two client classes, the sparql processor and the persistent storage. the ones which i can image do not permit the sparql processor interact with the store in a manner which conforms to sparql update 2.2. i must have misunderstood something rather basic in this proposal, so a diagram would help.

cygri commented 5 years ago

@afs

An alternative is an addition query string parameter and use the usual endpoint for query/update [...] It ai also helpful to client libraries as the sequence "begin"-any existing code-"commit" works.

How so? Existing code would need to be modified to pass in and send the additional query string parameter.

The only things that can be passed in to existing client libraries in an interoperable way is the endpoint URL, so if not changing existing client code is a priority, then the transaction ID needs to go into the endpoint URL.

Aside: Different endpoints for R and RW is fine. Different endpoints for R and W just makes life difficult for clients.

cygri commented 5 years ago

@lisp Why would a service endpoint provider of this to-be-defined transactional protocol be bound by SPARQL 1.1 Update § 2.2?

The processor would extract the transaction ID from the HTTP request (be it a part of the endpoint URL, a query parameter, or a request header), and apply the update operations in the request payload to the correct in-progress transaction. This requires that the store supports transactions in the first place, of course.

lisp commented 5 years ago

oh.

w3c / sparql-dev

Multi-request transaction support in the SPARQL protocol #83

Previous work

Considerations for backward compatibility

See also