w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
123 stars 19 forks source link

Whole-query-call scope for BNODE(...) with parameter #36

Open miguel76 opened 5 years ago

miguel76 commented 5 years ago

Current status

In SPARQL 1.1, the function BNODE(...), when used with an argument (a simple literal or a xsd:string), creates/reuses a blank node associated to that literal in the scope of a single solution mapping. Given the choice for the scope, the version with argument does not add expressiveness to the language, given that this behavior can be replicated by binding the expression BNODE() to a variable and then reusing it.

Missing expressiveness

In CONSTRUCT queries, there is often the need to generate new resources that are referenced across multiple solution mappings. This can be currently done by generating appropriated URIs and using the function IRI(...). There is no way to generate blank nodes having the same role in the output graph.

Proposal

I propose to:

Implementation cost

For the implementations I know of, the semantics in SPARQL 1.1 require more work than the ones proposed here: to check if an existing blank node has to be reused, for each query call and each solution binding a different blank node map has to be maintained; in this proposal a single map for each query call is enough.

Backward compatibility

This proposal, as described so far, would not be backwards compatible (it changes the semantics of an existing function), but:

cygri commented 5 years ago

In CONSTRUCT queries, there is often the need to generate new resources that are referenced across multiple solution mappings. There is no way to generate blank nodes having [this role].

Is that so? In the majority of implementations I've tried, it seems that this is possible by placing a no-argument bNode() call at the right point in the query. For example, a BIND (bNode() AS ?root) clause right at the start of the query pattern will produce a single unique blank node that is shared across all solutions.

That being said, from reading the spec I don't understand what the bNode(xxx) form with argument form is actually supposed to return. The spec says:

If the form with a simple literal is used, every call results in distinct blank nodes for different simple literals, and the same blank node for calls with the same simple literal within expressions for one solution mapping.

But that doesn't seem to be what's generally implemented.

SELECT ?a ?b {
    { BIND (bNode("x") AS ?a) }
    { BIND (bNode("x") AS ?b) }
}

The solution mapping for both calls is the same—the empty solution. Yet ?a and ?b are different in the majority of processors I tried.

Then we have:

SELECT ?a ?b {
    BIND (bNode("x") AS ?a)
    BIND (bNode("x") AS ?b)
}

The solution mapping is different for both calls—the empty solution for the first, and a solution mapping ?a to a blank node for the second. Yet ?a and ?b are the same in the majority of processors I tried.

So at the very least, some clarification of the spec text might be needed.

dydra commented 5 years ago

after "... within expressions for one solution mapping," the next sentence says

This functionality is compatible with the treatment of blank nodes in SPARQL CONSTRUCT templates.

according to which, a better test case would be

SELECT ?a {
  VALUES ?z { 'abc' 'def' }
   { BIND (bNode('x') AS ?a) } 
}

that said, while a CONSTRUCT form provides a context within which to distinguish solution mappings, it is not clear how a processor is to distinguish them in general.

afs commented 5 years ago
   SELECT ?a ?b {
       { BIND (bNode("x") AS ?a) }
       { BIND (bNode("x") AS ?b) }
}

There are two solution mappings, one inside each {}, which get joined (cross product) to form the third solution mapping that is the overall result. So there will be different blank nodes.

New solution mappings get made when join and other operations happen. In join, the "merge(μ1, μ2)" is a new solution mapping.

dydra commented 5 years ago

does this mean that the intent is that one (or more) of these is true?

in other words, what is it intended by

SELECT ?a ?c 
WHERE {
  {   SELECT ?a 
      WHERE {
         BIND (bNode('x') AS ?a) 
      }
  }
  BIND (bNode('x') AS ?c)
}

or by

SELECT ?x ?c 
WHERE {
  {   SELECT ?x
      WHERE {
         BIND (bNode('x') AS ?a) 
         BIND (bNode('x') AS ?b) 
         BIND (isBlank(?b) AS ?x) 
      }
  }
  BIND (bNode('x') AS ?c)
}
cygri commented 5 years ago

@afs That is not what the spec says though. A solution mapping is defined as a function from variables to RDF terms. So it's a set of bindings, which are pairs of a variable and an RDF term. The empty solution mapping is the empty set. You cannot say “it's a different empty set in that other graph pattern.” The identity of sets is defined by their members.

I understand the intent that you are describing, but either the formalism in the spec doesn't reflect this intent, or else the formalism is not based on standard maths.