Support HTTP PATCH with SPARQL Update format

no-reply commented 9 years ago

LDP recommends support of HTTP PATCH for Resources, especially for updating Containers.

At https://github.com/ruby-rdf/rdf-ldp/issues/10, we agreed to an initial implementation supporting SPARQL 1.1 Update format.

lisp commented 9 years ago

especially if there is already support a sparql processing, the argument for ldpatch remains to be made. when considering how can accommodate the cases for which sparql is not well suited, an alternative is to use its extension facility to improve upon it, rather than inventing a new language, protocol, run-time, etc.

i have drafted some notes to that effect and post a preliminary version for your deliberations: http://dev.dydra.com:9292/2015/08/14/sparql-patches

no-reply commented 9 years ago

It would be a good idea to implement SPARQL Update PATCH on top of this branch, which has LDPatch support: https://github.com/ruby-rdf/rdf-ldp/compare/feature/ld-patch

Adding SPARQL Update should be a fairly minimal project.

lisp commented 9 years ago

we will be releasing support in the imminent future for the sparql extensions sufficient to match ldpatch functionality, but a protocol issue remains unclear: what is the appropriate content type for a sparql content for a patch method. the wiki note indicates “text/sparqlpatch”, but i would tend to rephrase that as “application/sparql-update” as follows from the later sparql recommendations and perhaps to permit sparql-update (or even sparql-query) silently as alternatives.

no-reply commented 9 years ago

I think text/sparqlpatch is appropriate if you're able to make use of the purported benefits attached to using the SparqlPatch subset of Update. Otherwise I would treat the two as interchangeable.

Using application/sparql-update seems appropriate, here, since I can't see why we would want the subset.

we will be releasing support in the imminent future for the sparql extensions sufficient to match ldpatch functionality

I'm curious about how you are handling the problem that path expressions are meant to solve in LDPatch.

no-reply commented 9 years ago

Also, I'm curious whether by "releasing support" you mean for Ruby SPARQL, or somewhere else?

lisp commented 9 years ago

my interest is to explore the respective benefits of "sparql patch", "sparql update, and "ldpatch" in order to understand what a linked-data service - in general, needs to offer. the one i can best steer is dydra's. thus the blog post (http://dev.dydra.com:9292/2015/08/14/sparql-patches) which describes how to extend sparql update such that it suffices for the use cases asserted by ldpatch. at least, so far as i can comprehend bertails' essay and the ldpatch note.

the support-to-be-released is for the list operators as sparql extension functions, which address the deficiency in sparql with respect to binding intermediate nodes on paths. this was the only issue which i recognized from the ldpatch essays. with those extensions, it was possible to implement all the described cases as sparql update. if there are examples in addition to the ones which i excerpted from those sources or if i omitted a significant one, please advise.

gkellogg commented 9 years ago

Hi James, I was thinking along similar lines, but I think we also need an index feature added to property paths to get the full effect, likely using their existing syntax. Note that adding constraints within a property path would be a fairly large change, but could be accomplished by breaking the path up and inserting restrictions directly upon the intermediate node.

Regarding your list functions, they are certainly a start towards providing useful functions to deal with lists. However, in your examples, list modifications do not seem to remove the nodes which are removed, leaving them free-floating in the graph. IIRC, the LD Path spec calls for them to be removed using the same logic as for Cut, requiring something like a Concise Bounded Description match.

Within RDF.rb (develop branch), list modifications are done by using the RDF::List class with #[] and #[]= methods. Basically, this just turns the list into a Ruby Array, and recreates the list after the array operations are performed (with cleanup handled elsewhere in LD::Patch::Algebra::UpdateList). There are a few special cases for preserving the list head, but that was the simplest place to handle it. For a Lisp implementation, of course, list operations are probably simplest.

I agree that some notion of stable BNode scope will be useful, but it might be too difficult for client to do different things depending on whether or not there is a stable BNode scope; it would simplify things a lot, though.

no-reply commented 9 years ago

I agree that some notion of stable BNode scope will be useful, but it might be too difficult for client to do different things depending on whether or not there is a stable BNode scope; it would simplify things a lot, though.

:+1:

For solving this on the server side, the simplest approach has got to be just to guarantee node ids in some specific scope. I don't think that will work here, short of skolemization/de-skolemization at the RDFSource layer, for the same reason it won't work for general clients: we have to be able to support various implementations which may not make that guarantee.

Similarly, it's not a satisfactory approach at the specification level.

I'm tempted by the idea that path expressions could be introduced as a SPARQL extension. Offering a cheaper way to bind variables to blank nodes than BGP would resolve the problem.

On the other hand, the next problem from where I sit is: given two Graphs G1 & G2 with shared bnode scope, what is the algorithm for generating an LDPatch (with path expressions for the relevant nodes) that generates G2 from G1?

lisp commented 9 years ago

Regarding your list functions, they are certainly a start towards providing useful functions to deal with lists. However, in your examples, list modifications do not seem to remove the nodes which are removed, leaving them free-floating in the graph.

the list functions themselves are just that, they effect no modifications. the examples all intended for the requisite delete clauses to be in the update expressions.

the schematic description of the functional operators involved no intermediate data model. they were intended to indicate how simple it is to implement them directly against a store based on the basic pattern match facility present for any bgp processing. it would be reasonable to define the equivalent of setf expanders for them to be used in insert clauses with defined deletion side effects. they would be similarly straight-forward, but that would be a topic of its own.

lisp commented 9 years ago

... we also need an index feature added to property paths to get the full effect, likely using their existing syntax. Note that adding constraints within a property path would be a fairly large change, but could be accomplished by breaking the path up and inserting restrictions directly upon the intermediate node.

if you need to bind the intermediate nodes, that would be a significant syntactic change for paths.

what sorts of indexed and constrained access would go beyond 'nth', 'nthcdr', 'member' - especially if there is something like member-if which integrates exists-like evaluation?

lisp commented 9 years ago

I'm tempted by the idea that path expressions could be introduced as a SPARQL extension. Offering a cheaper way to bind variables to blank nodes than BGP would resolve the problem.

this is where the ldpatch argument sorely fails to convince. if there is somewhere a description, how its alternative will necessarily perform fewer matches and less iteration, it was not apparent relative to the schematic re-implementations.

gkellogg commented 9 years ago

this is where the ldpatch argument sorely fails to convince. if there is somewhere a description, how its alternative will necessarily perform fewer matches and less iteration, it was not apparent relative to the schematic re-implementations.

It's not about a more efficient query, as it all comes down to BGP ultimately, but syntactic sugar. For example, the path <#> / foaf:knows / 2 / foam:name might bind to the name of a loaf:Person contained within a list, such as the following:

<gregg> foam:knows (<tom> <arto> <james>) .
<james> foaf:name "James Anderson" .

Ultimately, this is going to use a SPARQL path such as <gregg> foaf:knows/rdf:rest/rdf:rest/rdf:first/foaf:name ?name, but it is easier to state using an integer list offset, as in LD-Patch, and I think this would be a nice extension to SPARQL property paths.

lisp commented 9 years ago

It's not about a more efficient query, as it all comes down to BGP ultimately, but syntactic sugar.

if that is the true argument, the w3c note's claim of "simplicity, ease of implementation, and run-time performance" should be reformulated to better emphasize the actual intended benefits. my tolerance for sugar diminishes with time. perhaps it is a deficiency which comes with age. could be, having seen too many of those frosted layer-cakes which, when you fall for the decoration and have a slice delivered to your table - nicely dressed with whipped cream, it reveals itself as just the most tasteless substance one can imagine and you hope that they at least used real vanilla in the whipped cream so that something has some taste at all...

years and years ago, as a component of a lisp-based xml processor (https://github.com/lisp/de.setf.xml) i implemented a nominally complete version of the then-current xpath language. i tried to use it for a project, but eventually rejected using the xpath formulations themselves in favour of using the operators in the base runtime implementation directly. this not only because of the known "language impedance" issues, but because the base run-time, that is lisp, offered its well.known facilities for functional composition and abstraction.

yes, the sparql authors demonstrated an uncanny lack of insight with regards to abstraction and composition. given the current state of language development, to the point of negligence. despite their oversight, where 'cut' is algebraically a composition of 'describe' and 'delete', and side-effects on slices are equivalent some combination of list operators, i continue to expect more diligence from a group which claims an imperative and the semantic data community would likely be better served over the long term by attending to the lack of abstraction and functional composition in its core tooling - neither of which ldpatch offers, than by finding a new way to express old operations.

there are certainly significant operations which cannot be expressed as matching and projection. kleen star was one and led to paths. it is more important to find out what those are, than to find new ways to implement something which one already knows how to do. an ability to bind intermediate nodes is another such capability. if you will permit me to sugar something which would be readily expressed with list operators and bindings as a bgp,

<#> / foaf:knows / 2 / foam:name

is equivalent to

(nthcdr 2 { <gregg> foaf:knows ?acquaintances }) foaf:name ?name .

no-reply commented 9 years ago

It's not about a more efficient query, as it all comes down to BGP ultimately, but syntactic sugar.

if that is the true argument, the w3c note's claim of "simplicity, ease of implementation, and run-time performance" should be reformulated to better emphasize the actual intended benefits.

This is not at all the argument I read in the Bertails post.

What I understand him to be arguing is that a LD Patch path expression has a simple tree semantics. We have a node set, starting with single node, and each "step" replaces it by mapping the set on a property function. The only other operation is the filter, which shares those semantics internally. This isn't the case for SPARQL where we have to support more complex graph patterns, and therefore merge joins on potentially large sets.

Though it would be easy enough to write BGPs for any given LD Patch path, it seems obvious enough that the semantics of the BGPs are more complex than those of the paths.

You could reduce the BGPs to the paths and optimize the queries (do SPARQL implementations do this in practice?), but even still, as a server host I may not want to support atomic PATCH requests that could reasonably be expected to run for many minutes just to decide that there's one triple to update.

Paired with the added clarity on the client-side (SPARQL Update will succeed on minor node selection errors, resulting in partial updates, where LD Patch fails nicely), this seems like a fairly compelling case to me.

That said, I've seen the attitude expressed by @lisp from many experts on this issue, including folks on the LDP WG, so I'm really interested to see where I'm going wrong. Help me out?

no-reply commented 9 years ago

Closed by #15

artob commented 9 years ago

That said, I've seen the attitude expressed by @lisp from many experts on this issue, including folks on the LDP WG, so I'm really interested to see where I'm going wrong. Help me out?

I think the gist of what @lisp expressed to me earlier was that any such posited benefits would entirely depend on your actual underlying storage model. The abstract semantics (which could be expressed in terms of SPARQL property paths) are one thing, the concrete execution of those semantics quite another. In a typical triple store, you'd not gain any benefit; in a native graph store such as Neo4j you might.

It's all well and good to say "does not operate on triples" in the abstract; too bad only when the underlying store disagrees. The conflation of triple stores with graph databases is a pretty common error, since we do on the surface level use rather similar abstract graph data models. How the data is actually organized and indexed on disk tends to be significantly different, though, reflecting exactly the differing use cases and access patterns the respective products cater to.

lisp commented 9 years ago

if that is the true argument, the w3c note's claim of "simplicity, ease of implementation, and run-time performance" should be reformulated to better emphasize the actual intended benefits.

This is not at all the argument I read in the Bertails post. What I understand him to be arguing is that a LD Patch path expression has a simple tree semantics. We have a node set, starting with single node, and each "step" replaces it with a simple property function. The only other operation is the filter, which shares the semantics. This isn't the case for SPARQL where we have to support more complex graph patterns, and therefore merge joins on potentially large sets. I've seen the attitude expressed by @lisp from many experts on this issue, including folks on the LDP WG, so I'm really interested to see where I'm going wrong. Help me out?

i will try.

your restatement of the argument is at least concise and gathers itself into a form which permits a response.

You could reduce the BGPs to the paths and optimize the queries (do SPARQL implementations do this in practice?), but even still, as a server host I may not want to support atomic PATCH requests that could reasonably be expected to run for many minutes just to decide that there's one triple to update.

[as an aside, where did the node designator come from in the first place? if there was some multi-minute query run yesterday, why should it matter, when?]

the public ldpatch discussions - and, i gather from your description above, your understanding as well, attach some primacy to a “node” beyond that which inheres in rdf’s notion of “resource”. the concept appears in use, as if such entities exist in some concrete form apart from the abstract entities designated by resource identifiers and exhibit special characteristics which permit more efficient computation of the other nodes which satisfy a path expression involving an originating “node” and a “predicate”.

bertails’ post echos similar sentiment, where it suggests that the abstract semantics for sparql evaluation necessarily entail the implementation and distinguishes ldpatch, with respect to the effect of the ldpatch bind form,

The runtime complexity for matching nodes in a graph is known to be extremely bad in some cases. While SparqlPatch is better that SPARQL Update in that regard, there are still some issues, which become apparent only when you start implementing and thinking about the runtime semantics. The main data structure in the SPARQL semantics is the Solution Mapping, which keeps track of which concrete nodes from the graph can be mapped to which variables, applying to each clause in the WHERE statement. So the semantics of the Basic Graph Pattern (ie. all the clauses in the SparqlPatch’s WHERE) involves a lot of costly cartesian products. Also, it would be nice to change the evaluation semantics of the Basic Graph Pattern such that the evaluation order of the clauses is exactly the one from the query. It makes a lot of sense to let the client have some control over the evaluation order in the context of a PATCH. Unlike SparqlPatch, the Bind statement does not operate on triples. Instead, an LD Path expression (/ ^schema:url) is evaluated against a concrete starting node (http://conferences.ted.com/TED2009/). The result node gets bound to a variable (?ted) which can then be used in the following statements. That is the main difference when compared to SparqlPatch semantics.

nothing makes clear, why this should, in itself, necessarily change the evaluation complexity. if it were to be true, that the ldpatch language in combination with the protocol necessarily change something the manner in which path expressions are to be computed, that would be possible, but that remains to be demonstrated. to express the questionin another way, is there some qualitiative consequence which follows from cartesian products of low cardinality and when, under what circumstances.

i went looking for what i recall to have been a paper which included both a complexity analysis and performance results for a path evaluation algorithm which implemented the regualr path semantics which was adopted for sparql in the aftermath of the arenas/conca/perez “yottabyte” paper[1]. i have not located it, which i regret, as, by my recollection it was the most concrete of the available essays. as an alternative, i did rediscover the losemann/martens paper[2], which appeared at roughly the same time as perez’ and includes a bit more discussion than its more well-known counterpart about the consequences of alternative path semantics. in particular, its section 3.1 describes the algorithm and arrives at a complexity result of O(|r| x m log Rmax) where "|r|" is the path expression size and "m log Rmax" reflects the graph size in terms of the number of solutions for isolated path segment matches, with the conclusion, that the corresponding algorithms will run in polynomial time. this sets a lower bounds on the complexity for path evaluation for an abstract representation which involves triples and operates on the basis of arrays or fields of term designators. the alternative, as yet unlocated paper, described how to implement this in concrete terms for the most complex case - that of unbounded paths, in terms of paired intermediate field in opposite sort orders.

given this result, to continue the argument in favor of ldpatch, above, the case could be made, that something about ldpatch reduces path evaluation complexity, perhaps in terms of path length, in terms of intermediate path set cardinality, or in terms of the factors which led to the martens &co "m log Rmax” bounds. that is certainly possible

from the discussion, i appreciate two ways in which this could be true: match time of a small constant or singleton match result sets

on the topic of match latency, it would be true, for example, if the implementation were based on a graph store which relied on an attributed node model, such as neo4j. in this case, the goal is to render resource designators, identifiers and locations as identical as possible and the match factor reduces to a constant. if, however, one follows the indications of the python implementation which bertolis cites[3], it devolves, in the manifestation of its “graph_value” operator, to a triple match with two constant terms. that is, it expresses no imperative for a node-centric concrete storage model and gives no indication of the hoped-for designation/location conflation. several in memory rdf store implementations do reduce match latency to a constant time operation in that their concrete storage model is hash tables. (my memory reconciles that claim to sesame’s in-memory implementation and even to one of the very first rdf store implementation, wilbur) yet, with respect to the question regarding optimization above, while one would most likely not "reduce the BGPs to the paths and optimize the queries” [as such], the work on rdf store implementations such as rdf-3x[4] and hexastore[5] demonstrate methods to reduce the cost of a triple match to a minimum by means of index arrangements, which suggests, it might be feasible to reduce bgp processing to path-like iteration given the appropriate result set cardinality.[6]
on the subject of match result set cardinality, the first line in the ldpath example in betrails’ post reiterates the premise for the argument in unambiguous terms: “an LD Path expression is evaluated against a concrete starting node”: Bind ?ted http://conferences.ted.com/TED2009/ / ^schema:url . in other words, the use case is intended to be match result sets of cardinality one. in which case the complexity result, above, indicates that performance results from [6] are to be expected and the variations will follow from how well the index implementation performs matches.

in addition, there are contingent issues, which the ldpath discussion never adresses.

what happens when the designators to which the patch is to be applied are the result of some other on-going process? that is, what happens when there turns out to be more than one. in that case, then the governing factor will no longer be the match time, but the overhead entailed by iteration.
what happens, when the match time is dominated by secondary storage latency? despite the benefits of agressive indexing demonstrated by rdf-3x, there will always be cases where numbers dictate non-uniform storage. in those cases, a trade-off will appear at the point where the costs of set processing are be lower than those for singleton processing, at which point, it would be preferable to permit the store to rearrange the process, rather than to shift the implementation to another language.

perhaps, this helped... — [1] : http://www.dcc.uchile.cl/~jperez/papers/www2012.pdf [2] : http://www.theoinf.uni-bayreuth.de/download/tods13.pdf [3] : https://github.com/pchampin/ld-patch-py/blob/master/ldpatch/processor.py [4] : https://cs.uwaterloo.ca/~gweddell/cs848/papers/RDF-3X.pdf [5] : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.140.8776&rep=rep1&type=pdf [6] : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.383.2534&rep=rep1&type=pdf

no-reply commented 9 years ago

I have a bunch of reading to do to catch up to where you are on this. For now, a few observations from an initial read:

Your points about the relationship between semantic complexity benefits and runtime complexity are well taken. Pushing ldpatch boosters to make their runtime claims concrete seems important.

You say:

[as an aside, where did the node designator come from in the first place? if there was some multi-minute query run yesterday, why should it matter, when?]

agreed. To me this is the most significant question. If all we're doing is putting the burden on the client to (a) know the entire server state of the relevant bnodes; and (b) compute a unique path to each... we're making the problem worse.

the public ldpatch discussions - and, i gather from your description above, your understanding as well, attach some primacy to a “node” beyond that which inheres in rdf’s notion of “resource”.

I think we're talking simply about "node"s in the sense used by 1.1 Concepts. If there's some other sense required by the weight of the statements involved, that should be called out.

...the concept appears in use, as if such entities exist in some concrete form apart from the abstract entities designated by resource identifiers...

It seems clear to me that the Resources are distinct, though closely identified with, the actual nodes in the graph. To wit: nodes are members of the set of subjects and objects of triples in the graph; Resources are things in the graph's universe.

Similarly, an IRI, Blank Node, or Literal is not a "node" outside of the context of some statement (and some graph). This seems evident, and is ontological baggage required for SPARQL as much as LDPatch, yes?

re: “an LD Path expression is evaluated against a concrete starting node”

Bertails appears to be talking about the entry point, not the result set here. I.e. you must begin a path expression with either a specific URI or a variable already bound to a single node (URI or Blank Node). The result set may have any cardinality; though BIND will result in an error if the expression has cardinality other than 1.

The rest of this, I'll have to return to when I've had time to do my assigned reading. :)

lisp commented 9 years ago

I think we're talking simply about "node"s in the sense used by 1.1 Concepts. If there's some other sense required by the weight of the statements involved, that should be called out.

while this is the only use which the domain concepts sanction, somehow additional properties are ascribed to the initial constant node, which render the ldpatch navigation process inherently more efficient than the equivalent in terms of joined triples.

Bertails appears to be talking about the entry point, not the result set here. I.e. you must begin a path expression with either a specific URI or a variable already bound to a single node (URI or Blank Node). The result set may have any cardinality; though BIND will result in an error if the expression has cardinality other than 1.

whereby, as soon as the intermediate cardinality increases (cf perez' yottabyte) the most effective implementation strategy may be some from of evil "bgp-like" processing, though perhaps not as evil as literal "solution mapping".

ruby-rdf / rdf-ldp

Support HTTP PATCH with SPARQL Update format #11