Whole-query-call scope for BNODE(...) with parameter

miguel76 commented 5 years ago

Current status

In SPARQL 1.1, the function BNODE(...), when used with an argument (a simple literal or a xsd:string), creates/reuses a blank node associated to that literal in the scope of a single solution mapping. Given the choice for the scope, the version with argument does not add expressiveness to the language, given that this behavior can be replicated by binding the expression BNODE() to a variable and then reusing it.

Missing expressiveness

In CONSTRUCT queries, there is often the need to generate new resources that are referenced across multiple solution mappings. This can be currently done by generating appropriated URIs and using the function IRI(...). There is no way to generate blank nodes having the same role in the output graph.

Proposal

I propose to:

extend BNODE(...) expected argument to be any RDF term;
consider the whole query call as scope for the association with the given RDF term (i.e., every invocation of BNODE(...) with the same argument inside the same query call will return the same blank node).

Implementation cost

For the implementations I know of, the semantics in SPARQL 1.1 require more work than the ones proposed here: to check if an existing blank node has to be reused, for each query call and each solution binding a different blank node map has to be maintained; in this proposal a single map for each query call is enough.

Backward compatibility

This proposal, as described so far, would not be backwards compatible (it changes the semantics of an existing function), but:

it is quite possible that this would not be a problem in practice, if (as I guess) the version of BNODE(...) with argument is not currently much used;
to avoid the problem altogether, the function with the new semantics could be given a new name (e.g., BNODE_UNIQUE(...)) while the function BNODE(...) could keep its previous semantics.

cygri commented 5 years ago

In CONSTRUCT queries, there is often the need to generate new resources that are referenced across multiple solution mappings. There is no way to generate blank nodes having [this role].

Is that so? In the majority of implementations I've tried, it seems that this is possible by placing a no-argument bNode() call at the right point in the query. For example, a BIND (bNode() AS ?root) clause right at the start of the query pattern will produce a single unique blank node that is shared across all solutions.

That being said, from reading the spec I don't understand what the bNode(xxx) form with argument form is actually supposed to return. The spec says:

If the form with a simple literal is used, every call results in distinct blank nodes for different simple literals, and the same blank node for calls with the same simple literal within expressions for one solution mapping.

But that doesn't seem to be what's generally implemented.

SELECT ?a ?b {
    { BIND (bNode("x") AS ?a) }
    { BIND (bNode("x") AS ?b) }
}

The solution mapping for both calls is the same—the empty solution. Yet ?a and ?b are different in the majority of processors I tried.

Then we have:

SELECT ?a ?b {
    BIND (bNode("x") AS ?a)
    BIND (bNode("x") AS ?b)
}

The solution mapping is different for both calls—the empty solution for the first, and a solution mapping ?a to a blank node for the second. Yet ?a and ?b are the same in the majority of processors I tried.

So at the very least, some clarification of the spec text might be needed.

dydra commented 5 years ago

after "... within expressions for one solution mapping," the next sentence says

This functionality is compatible with the treatment of blank nodes in SPARQL CONSTRUCT templates.

according to which, a better test case would be

SELECT ?a {
  VALUES ?z { 'abc' 'def' }
   { BIND (bNode('x') AS ?a) } 
}

that said, while a CONSTRUCT form provides a context within which to distinguish solution mappings, it is not clear how a processor is to distinguish them in general.

afs commented 5 years ago

   SELECT ?a ?b {
       { BIND (bNode("x") AS ?a) }
       { BIND (bNode("x") AS ?b) }
}

There are two solution mappings, one inside each {}, which get joined (cross product) to form the third solution mapping that is the overall result. So there will be different blank nodes.

New solution mappings get made when join and other operations happen. In join, the "merge(μ1, μ2)" is a new solution mapping.

dydra commented 5 years ago

does this mean that the intent is that one (or more) of these is true?

if a bNode(string) appears in a form which extends a given solution mapping in which a binding already appears to a bnode which was generated with the same stem, then it should produce an identical term as the one already present?
if a bNode(string) appears in a form which extends a given solution mapping in which a binding already appears which is not to a blank node, but rather somehow depends in some way on binding which had been to a blank node which was generated with the same stem, then it should produce an identical term as the one which was at some point present?
if multiple bNode(string) appear in a single bgp, then they should all produce the identical term?

in other words, what is it intended by

SELECT ?a ?c 
WHERE {
  {   SELECT ?a 
      WHERE {
         BIND (bNode('x') AS ?a) 
      }
  }
  BIND (bNode('x') AS ?c)
}

or by

SELECT ?x ?c 
WHERE {
  {   SELECT ?x
      WHERE {
         BIND (bNode('x') AS ?a) 
         BIND (bNode('x') AS ?b) 
         BIND (isBlank(?b) AS ?x) 
      }
  }
  BIND (bNode('x') AS ?c)
}

cygri commented 5 years ago

@afs That is not what the spec says though. A solution mapping is defined as a function from variables to RDF terms. So it's a set of bindings, which are pairs of a variable and an RDF term. The empty solution mapping is the empty set. You cannot say “it's a different empty set in that other graph pattern.” The identity of sets is defined by their members.

I understand the intent that you are describing, but either the formalism in the spec doesn't reflect this intent, or else the formalism is not based on standard maths.

w3c / sparql-dev