Open VladimirAlexiev opened 8 months ago
The last one ("Referential Transparency") is because filter clauses in the RIGHT
part of LEFT OPTIONAL { RIGHT }
have also access to the variables bound by LEFT
(see the If A is of the form Filter(F, A2)
special case of graph patterns translation.
So, in the first query ?price
is bounded because it is in the FILTER
whereas in the second query ?price
is unbound in the BIND
because it's not a FILTER
and, so, usual bottom-up evaluation semantic is used. The placement of FILTER
is a nasty part of the SPARQL spec
I believe I can explain the "Jena: one result :x :y but why is y bound?" thing. The LHS and RHS of the outer join (the OPTIONAL) are single solutions: ?x = :x
on the left, and ?y = :y
on the right. Since the right does NOT bind ?x
, the solutions are compatible (as per 18.3). Thus the join solution is ?x = :x, ?y = :y
. That solution binds ?x
, thus the FILTER's expression passes and thus the join condition succeeds (that is, the FILTER is a part of the join condition). Therefore, the joined solution is returned, as per the OPTIONAL semantics, not the LHS solution.
Others look a bit more straightforward to me.
As @klinovp says.
OPTIONAL
would have been better with a syntax like OPTIONAL(left join filter expression) { pattern }
.
Hindsight.
@VladimirAlexiev , you would have a stronger case if you were
as the issue is expressed, the likely responses will be along the lines of that from @Tpt , which are not likely to lead to what might be a useful discussion about valuable changes to the language.
@klinovp
single solution:
?y = :y
on the right
But there's FILTER( BOUND(?x) )
so why is that solution not discarded?
@lisp I should have written "Most SPARQL users haven't heard about SPARQL algebra" (and should not have to!). I consider myself a competent SPARQL user (eg see https://gist.github.com/VladimirAlexiev/cf2de89b692bbc2ae70917aae021ec07) but I don't care to learn or try to understand these peculiarities.
If what @Tpt wrote is true, then ?price
is visible in FILTER( ?price...
but invisible in BIND( ?price...
:
I think this defies logic or explanation.
@VladimirAlexiev, given the range of your contributions to the issues in this community group, your assessment, that the sparql optional filter semantics in relation to variable scope "defies logic or explanation", seems out of place. if you do not care to "learn or try to understand" its algebra, how do you propose to articulate an alternative sparql semantics which realizes your variant variable scoping rules?
A bold proposal (probably more for SPARQL 2.0 rather than 1.x):
I we state that it is a syntax error to use in an expression a variable that is not in-scope, ie. prevent using in expression variables that will be always unbound, I believe we have a way to prevent the user to fall into the listed "traps".
?x
not in-scope in the BIND( ?x as ?z )
?x
not in-scope in the BIND( ?x as ?z )
?x
not in-scope in the BIND( ?x as ?z )
?price
not in-scope in the BIND( ?price * (1 - ?discount) AS ?effectivePrice )
Note that we already have in-scope constraints in the SPARQL grammar (see note 12).
Such a change would not restrict SPARQL expressivity (not in-scope variable in expression can always be simplified).
are there any cases in sparql, where the scope of a variable is not statically apparent? if not, then it would introduce not change any result to use that determination to classify queries syntactically. in which case, is it necessary to change the major version number?
@VladimirAlexiev
But there's FILTER( BOUND(?x) ) so why is that solution not discarded?
Have a look at the SPARQL algebra. The FILTER is a part of the join condition, it's not just a filter sitting on top of the RHS only. It's evaluated over the join solution, not over the RHS solution.
Your query:
PREFIX : <http://example.org/>
SELECT * WHERE {
VALUES ?x { :x }
OPTIONAL {
FILTER( BOUND(?x) )
BIND( :y as ?y )
BIND( ?x as ?z )
}
}
is NOT the same as this query:
PREFIX : <http://example.org/>
SELECT * WHERE {
VALUES ?x { :x }
OPTIONAL {
{
FILTER( BOUND(?x) )
BIND( :y as ?y )
BIND( ?x as ?z )
}
}
}
The latter would return ?x = :x
, as you expect. Again, the algebra should make the difference fairly obvious.
If what @Tpt wrote is true, then ?price is visible in FILTER( ?price... but invisible in BIND( ?price...: I think this defies logic or explanation.
@Tpt is correct and the semantics makes perfect sense. What is confusing here is the syntax. As Andy noted above, a better syntax would make it obvious that the FILTER is a part of the join, not a post-processor of the OPTIONAL scope. It'd look like this:
SELECT * {
VALUES ?price { 10 }
OPTIONAL ( ?price * (1 - ?discount) < 10 ) {
VALUES ?discount { 0.10 }
}
}
or
SELECT * {
VALUES ?price { 10 }
OPTIONAL ( ?effectivePrice < 10 ) { # <-- is evaluated over joined solutions
VALUES ?discount { 0.10 }
BIND( ?price * (1 - ?discount) AS ?effectivePrice ) # <-- is evaluated over RHS solutions (and thus raises errors)
}
}
In the current SPARQL syntax the FILTER and the BIND are syntactically close to each other which obscures the fact that they are positioned in very different places in the algebra and process different binding sets.
are there any cases in sparql, where the scope of a variable is not statically apparent?
@lisp No, the scope is defined from the syntax tree by the spec.
if not, then it would introduce not change any result to use that determination to classify queries syntactically.
Yes! Exactly!
in which case, is it necessary to change the major version number?
This change makes invalid some queries that were valid and well defined according to SPARQL 1.0/1.1 So it looks kind of breaking to me. But it's only my personal opinion.
under the premise, that "the scope is [completely and correctly] defined from the syntax tree by the spec", if that definition is used to identify invalid queries,
under the premise, that "the scope is [completely and correctly] defined from the syntax tree by the spec", what class of queries is "valid and well defined" which comprises expressions which include variable references outside of the scope of some definition?
All queries that contains variable not in-scope in expressions like the 4 queries I listed in this answer. They are all valid SPARQL queries.
@tpt, how can it be true that,
[...] queries that contain variables not in-scope in expressions like the 4 queries I listed https://github.com/w3c/sparql-dev/issues/195#issuecomment-2002585506 [...] are all valid SPARQL queries.
is it not correct, that the expression in a bind form must include only variables in some scope in order for the containing query to be valid? this, independent of whether the variables happen to be bound in a given solution.
@Tpt, how can it be true that,
[...] queries that contain variables not in-scope in expressions like the 4 queries I listed https://github.com/w3c/sparql-dev/issues/195#issuecomment-2002585506 [...] are all valid SPARQL queries.
is it not correct, that the expression in a bind form must produce a value in order for the containing query to be valid?
The definition of extend (the algebra operation behind BIND
) is defined even if the expression returns an error. And the SPARQL grammar only states that The variable assigned in a BIND clause must not be already in-use within the immediately preceding TriplesBlock within a GroupGraphPattern.
but does not adds any restriction to the expression.
At my knowledge, there is not syntactic way in SPARQL to ensure that an arbitrary expression never fails. For example 1 + ?x
can error if ?x
is an IRI...
At my knowledge, there is not syntactic way in SPARQL to ensure that an arbitrary expression never fails. For example
1 + ?x
can error if?x
is an IRI...
Actually, there is: bind(coalesce(expr, "error") as ?x)
will never fail. If expr
raises an error, ?x
will be bound to "error". Query engines can use this fact to reason about (lack of) NULLs.
Otherwise, I agree. A query can have a BIND which refers to variables out of scope and be perfectly valid. Moreover, that BIND, which refers to variables out of scope, may not even raise errors at runtime.
from the point of view of interoperability, are there defined results for these queries which include forms which reduce to a reference to an undefined (not just unbound) variable?
I don't know what you mean by "from the point of view of interoperability" but yes, the spec does define results, incl. for queries that use BIND referring to out-of-scope variables. It's spec's job to define results for each syntactically valid query given the data.
I don't know what you mean by "from the point of view of interoperability"
the point is is that, while extend is defined such that a static analysis of variable definitions would not change the class of a query expression from undefined to invalid based on the constitution of the respective expression, it is not clear to this recommendation reader that this is the case for all expressions which include references to undefined variables.
this relates to the matter, whether to apply the results of such analysis would require a 2.* revision.
from this perspective, as reclassifying those expressions described in @tpt's list would require to change the definition for extend, the suggestion would require more at least a 2.0 jump, while for any other expressions which would change class from an undefined result to an invalid, a 1.* revision should suffice.
from this perspective, as reclassifying those expressions described in @Tpt's https://github.com/w3c/sparql-dev/issues/195#issuecomment-2002585506 would require to change the definition for extend, the suggestion would require more at least a 2.0 jump, while for any other expressions which would change class from an undefined result to an invalid, a 1.* revision should suffice.
Nit: my proposal would not change the Extend
operator definition but add a syntaxic restriction to the SPARQL grammar just like the existing one that prevents ?x
in BIND(... AS ?x)
to be already in-scope.
@lisp
to provide some valuable use case which the current language definition precludes.
I think I have one: fetching multi-valued fields of a subject, each of which needs to be in its own UNION clause to avoid Cartesian Product. If the bindings before/outside are not available inside the UNION, then you need to refetch that subject in each clause.
Eg see https://vocab.getty.edu/doc/queries/#All_Data_For_Subject and imagine that:
BIND (ulan:500115493 as ?s)
is instead a heavy subquery
(that's why I think that finding something using a complex search, and then returning its data that has a complex shape, makes for a difficult query)?s ^iso:superOrdinate ?ar FILTER NOT EXISTS {?ar xl:prefLabel ?t1}
:
do I need to repeat it in each further sub-clause?Can you rewrite this query?
@Tpt
how do you propose to articulate an alternative sparql semantics
I'm not competent enough to articulate an alternative. I'm just shocked at these "features" of SPARQL.
@klinovp
the semantics makes perfect sense. What is confusing here is the syntax.
Ok, this clarification is important for this forum, but it will be lost on any SPARQL user.
If the effective use of SPARQL requires learning the intricacies of an Algebra then that's a bad thing. Note that different repositories give different answers to (at least some of) the puzzles above. Hopefully these are borderline cases that users won't encounter often...
A clarification: I have the utmost respect for the members of this group (and all other creators of SPARQL), and similar for XQuery and XSPARQL... Devising a good query language is a difficult task, and passing it through the W3C standardization process is more difficult yet. And I hope I haven't embarrassed myself too badly :-)
If the effective use of SPARQL requires learning the intricacies of an Algebra then that's a bad thing.
you may not be willing to articulate it, but you imply that your domain would benefit from ways to manipulate its datasets which more directly correspond to its concepts than sparql does - or even, likely should.
Note that different repositories give different answers to (at least some of) the puzzles above.
from the discourse in this thread it appears clear that the recommendation is not ambiguous and that any differences are a consequence of implementation "variations". one could endeavour to increase the interoperability. this argues to include appropriate tests in a 1.2 test suite, but that would not, itself, bring you closer to your goal.
@frensjan @afs @TallTed @lisp @JervenBolleman (I don't even know how to define this issue: feel free to edit the title!)
@frensjan in https://github.com/w3c/sparql-dev/issues/100#issuecomment-1911693306 started a discussion on which bindings are passed between which SPARQL clauses and formulated some nice queries to exercise these questions.
I posted similar things in https://github.com/w3c/sparql-dev/issues/103 (but they are not yet reflected below).
Different SPARQL processors return different results on such basic queries :-(
&signal_unconnected=on
) on dbpedia.org endpointI don't know SPARQL algebra very well, but I guess it all comes from the bottom-up execution semantics of SPARQL.
Now: I have no illusions that the group will change SPARQL semantics to fit my intuitions. But maybe some option/flag/"mode" can be added to change the treatment of bindings. At the least, this issue will serve as a big warning for the unwary.
Brackets
:x :y :z
Optional
LHS optional {RHS}
, LHS bindings should not be passed to RHS:x
:x :y
but why isy
bound?:x :y :z
(reported by @frensjan as https://github.com/eclipse-rdf4j/rdf4j/issues/4882)Union
:x
:x
and:x :y :z
Referential Transparency
?effectivePrice
)has different semantics from:
10 0.1
and10
10
and10
10 0.1
and10 0.1 9.0