w3c / sparql-query

https://w3c.github.io/sparql-query/
Other
7 stars 0 forks source link

Addressing SPARQL EXISTS errata #156

Open afs opened 1 week ago

afs commented 1 week ago

Recap

TPAC 2023 presentation

Issues: sparql-query/issues for EXISTS

After TPAC 2023, an email was sent to interested parties.

Proposals

1:: Improved substitution SEP-0007/Substitution

2:: SemiJoin/Antijoin https://w3c.github.io/sparql-exists/docs/sparql-exists.html#proposal-a

Proposal 1

Proposal 1 is based on errata for the "Substitution" operation.

Full details including relationship to errata: SEP-0007/Substitution.

Proposal 2

Proposal 2 is SemiJoin/AntiJoin.

SPARQL already has MINUS which is an antijoin with a special condition for the case of disjoint domains (a decision of the SPARQL 1.1 working group).

A way forward.

A compromise way forward:

  1. Replace "Substitute" with the errata-derived fix SEP-0007/Substitution

  2. Plan for adding LATERAL, SEMIJOIN and ANTIJOIN (both pure forms) to the SPARQL language. This may have to be additional features added in the "new features" phase due to timing.

pchampin commented 4 days ago

This was discussed during the rdf-star meeting on 26 September 2024.

View the transcript

Addressing SPARQL EXISTS errata 4

ora: Are there people fine with the current syntax?

ora: In any case, chairs will discuss this, let's move on

AndyS: [about SPARQL EXISTS] There are two proposals

AndyS: 1. substitution based on various existing errata

AndyS: 2. an other one based on ANTIJOIN. We already have MINUS. Except the behavior with disjoin domain. But outside of it it's ANTIJOIN

AndyS: On an other note, there are other things that might go to SPARQL like LATERAL that can be based on substitution. And pure form of anti join and semi join

AndyS: It's a possibility to move these additions (LATERAL, anti join...) to sparql dev

pchampin: we would add more subtly differences between operators like FILTER NOT EXISTS vs MINUS

pchampin: Your point of having multiple ways might create problems

ora: SPARQL spec spends a bit of time presenting this difference

AndyS: It was quite contentious in SPARQL 1.1

<pchampin> I'm more than happy to let the editors decide on that

AndyS: I am not aware of any outgoing opinion, I think it ends up to a choice on which way to go

tl: is it related to triple terms in any way of is it a SPARQL errata

AndyS: it has nothing to do with triple terms

tl: what is the criteria of SPARQL errata to discuss now?

tl: it's a central issue, is that the argument?

pfps: There are a bunch of problems with SPARQL, the ones with EXIST are the biggies

pfps: They end up splitting the SPARQL implementation space

pfps: The decision that has to be made is to move SPARQL EXIST toward a more database-like implementation and keep it more consistent with the existing

AndyS: The current implementation is present in SQL with correlated subqueries

pfps: if you use the semi/anti join interepretation of EXISTS you change SPARQL more than the other option

pfps: In the end people who will see and understand the differences are very few

ora: I would like to know preferences

AndyS: My preference is for substitution and applying errata (option 1)

pfps: I don't have much of a horse in this race

pfps: Idealy I would love to get more SPARQL developers on board

ora: we could talk outside of the group

ktk: I reached out to stardog but not got an input

gtw: I am not sure much value to reach out to more developers. sparql-dev has been opened for a long time

<pchampin> Tpt: I have a signicant preference for option 1; option 2 is basically equivalent to MINUS

pfps: One way to check the issue would be to pull some tests

<pfps> which PR?

<gkellogg> w3c/rdf-tests#42

<gb> Issue 42 tests to document current definition of EXISTS in SPARQL (by pfps) [SPARQL]

<gkellogg> w3c/rdf-tests#43

<gb> CLOSED Pull Request 43 Add tests to document current definition of EXISTS (by pfps)

ora: Whatever solutions we pick, someone will ask why we pick it

AndyS: picking sustitution breaks the least queries

ora: That seems to me a as good reason as any, let's make a decision

tl: I would like to ask james about it

ora: Let's vote on it next Thursday

ora: Let's do it


klinovp commented 4 days ago

Here's my 2c as a query engine tech lead at Stardog: I prefer fixing the substitution semantics, making it a part of the spec, and keeping EXISTS as a form of correlated semi-join based on substitution. Two main reasons:

Eventually, I would really like to see LATERAL as a part of a future SPARQL standard and that also will require substitution (Stardog implements it using the SERVICE syntax). I think it's important that the next release of the spec actually uses the term "correlated" in the text when it articulates the differences between EXISTS and MINUS, so that introduction of general correlated subqueries through LATERAL then looks like a natural next step in the process.

VladimirAlexiev commented 4 days ago

@domel sent us this "questionnaire" by email. I'll paste it below, because I find it a useful roadmap to this topic.

The RDF-star Working Group is currently addressing issues related to updating the semantics of SPARQL EXISTS, specifically regarding:

Would you be able to provide your insights on these matters

afs commented 3 days ago

@domel sent us this "questionnaire" by email. I'll paste it below, because I find it a useful roadmap to this topic.

The RDF-star Working Group is currently addressing issues related to updating the semantics of SPARQL EXISTS, specifically regarding:

  • Certain uses of EXISTS being undefined during evaluation,
  • Substitution occurring where definitions apply only to variables,
  • Blank nodes being substituted into BGPs and acting as variables,
  • Substitution potentially flipping MINUS into its disjoint-domain case,
  • Substitution impacting disconnected variables.

Would you be able to provide your insights on these matters

This section of SEP-0007 expands on these points: https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0007/sep-0007.md#identified-issues