w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
123 stars 19 forks source link

allow UNDEF in more places in the grammar #62

Open VladimirAlexiev opened 5 years ago

VladimirAlexiev commented 5 years ago

Why?

Sometimes you need to check a condition and "unbind" a variable. (I know that SPARQL vars are "assign once", please read on).

Say you're generating some ontology from a sheet, and you use # at the beginning of a commented-out cell. Assuming you've set an appropriate base and that the ?prop var (column) holds a local name, you could go:

bind(uri(if(strstarts(?prop,"#"),UNDEF,?prop)) as ?propUri)

Except UNDEF is not allowed there, it's only allowed in VALUES. So you can try:

VALUES ?UNDEF {UNDEF}
bind(uri(if(strstarts(?prop,"#"),?UNDEF,?prop)) as ?propUri)

But this breaks streaming in tarql, so it causes problems on big files: https://github.com/tarql/tarql/issues/51. It took me a while to figure out you can do

bind(uri(if(strstarts(?prop,"#"),1+"",?prop)) as ?propUri)

and in fact in some large-scale data integration we used CPP to preprocess SPARQL and did like this

#define __UNDEF__ 1+""

bind(uri(if(strstarts(?prop,"#"),__UNDEF__,?prop)) as ?propUri)

Proposed solution

Allow UNDEF wherever a constant can be used. I see no reason why it's allowed only in VALUES.

Considerations for backward compatibility

None

VladimirAlexiev commented 5 years ago

Someone asked (I think on the mlist, I can't find it as an issue here) to allow if without else, i.e. if(cond,value). Such 2-arg if would completely solve the example given above, and in a better way.

But maybe there are some other cases where one would find UNDEF useful.

cygri commented 5 years ago

AFAIR, I've only ever seen two types uses of a “dummy unbound variable”:

  1. As the third argument to the if function
  2. In examples that explain SPARQL expression evaluation, like “coalesce(?unbound, 5) is 5”, or “?unbound == ?unbound is not true”

Making the third argument to if optional would solve the first case. The second case is not very strong, although I can still see an argument for allowing UNDEF in expressions, just for consistency and ease of explanation. It's easy to say “this evaluates to UNDEF” or “BIND (UNDEF AS ?x) does nothing”. It's harder to explain the same thing if one has to explain unbound variables and expressions that produce an error when evaluated. (Evaluating UNDEF would still be an expression evaluation error, just like evaluating an unbound variable is an expression evaluation error. So no change to expression semantics, just a clearer syntax.)

afs commented 5 years ago

UNDEF in VALUES is there to align a list so when defining (?x ?y ?z) the query can give settings for ?x and ?z and not ?y as can happen with graph patterns.

Some controlled use of UNDEF might work out,; if it is for if-without-else, then we can define the else clause evaluating to an error.

cygri commented 5 years ago

FWIW, Virtuoso allows UNDEF anywhere in expressions.

VladimirAlexiev commented 1 year ago

For the record: none of these are necessary:

 bind(1+"undef" as ?undef)
 values ?undef {UNDEF}

Just use the var ?undef without binding it: it has the value UNDEF. Of course, it can be used only in an expression, but not in a triple pattern (which would bind it).

mgberg commented 2 weeks ago

Bumping this, as I also have run into a similar need to conditionally bind a variable. Generally I run into this when creating complex CONSTRUCT queries (or sometimes INSERT queries) when some result rows need to result in certain triples in the CONSTRUCT template getting bindings while others should not have those triples generated.

A hack that I have been using to do this (which doesn't always work everywhere) is using an expression that always results in an error as the third argument to IF, such as IF(false, 1, 1+"a").

I agree that making the third argument to IF optional would easily solve this issue such that, e.g., BIND(IF(false, 1) AS ?x) would result in no binding for ?x. Alternatively, adding a new BINDIF function (or similar) specifically for this purpose might result in an easier to understand operation, as BIND(IF(false, 1) AS ?x) would not actually always bind a value.

I understand that the nature of unbound variables is not a concept that is present in SQL and an unbound variable is not at all the same as binding NULL, but it is interesting how using NULL in SQL is very common and there are many functions that utilize NULL in specific ways yet there really isn't a concept of NULL in SPARQL (except UNDEF for VALUES).

VladimirAlexiev commented 2 weeks ago

I agree with @cygri and @afs that allowing some form of "if without else" resolves this issue.

BIND (UNDEF AS ?x) does nothing

I cannot over-emphasize this. If you then use ?x in some triple pattern, it will match too many triples. So you need to guard with something like Optional {bound(?x) ?x ?p ?o}