w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
121 stars 19 forks source link

Named solution sets #44

Open dbooth-boston opened 5 years ago

dbooth-boston commented 5 years ago

"another gap in SPARQL that I have felt, and that Bryan Thompson aptly suggested a few years ago, is that SPARQL does not provide any mechanism for naming or saving solution sets, even though they are a fundamental concept in SPARQL. On a number of occasions I have wished that I could save an intermediate result set and then refer to it later, in producing final results." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0300.html

VladimirAlexiev commented 5 years ago

BlazeGraph has WITH/INCLUDE. Eg see this puppy from current work (querying Wikidata)

prefix ps:   <http://www.wikidata.org/prop/statement/>
prefix pq:   <http://www.wikidata.org/prop/qualifier/>
prefix wd:   <http://www.wikidata.org/entity/>
prefix wdt:  <http://www.wikidata.org/prop/direct/>
prefix bd:   <http://www.bigdata.com/rdf#>
prefix wikibase: <http://wikiba.se/ontology#>

select ?orgId ?GRID ?orgLabel ?officialName ?orgDescription ?countryLabel ?locationLabel ?year ?officialWebsite ?orgURL ?identifierWD ?identifierGRID ?sourceURL ?linkWD ?linkGRID
with {select distinct ?award {
  ?award wdt:P31/wdt:P279* wd:Q11448906. # science award
  ?award wdt:P444 []. # review score
}} as %AWARD
with {select distinct ?person {
  include %AWARD
  ?person wdt:P166 ?award.
}} as %PERSON
with {select distinct ?org {
  include %PERSON
  ?person
    wdt:P108      | # employer
    wdt:P436      | # member of (learned society)
    wdt:P69       | # educated at
    p:P512/pq:P69 | # academic degree / educated at
    p:P166/pq:P1416 # won award / affiliation. This may not be a notable award, but I can't write the correct union with include %AWARD
  ?org.
  filter not exists {?org wdt:P31/wdt:P279* wd:Q170584} # not a project
}} as %ORG {
  include %ORG
  optional {?org wdt:P1448 ?officialName}
  optional {?org wdt:P17 ?country}
  optional {?org wdt:P131 ?location} # located in administrative territorial entity
  optional {?org wdt:P580|wdt:P571 ?date bind(year(?date) as ?year)} # inception|start date
  optional {?org wdt:P856 ?officialWebsite}
  optional {?org wdt:P2427 ?GRID}
  bind(strafter(str(?org),str(wd:)) as ?orgId)
  bind(uri(concat("organization/Wikidata/",          ?orgId)) as ?orgURL)
  bind(uri(concat("source/Wikidata/",                ?orgId)) as ?sourceURL)
  bind(uri(concat("identifier/Wikidata/",            ?orgId)) as ?identifierWD)
  bind(uri(concat("identifier/GRID/",                ?GRID))  as ?identifierGRID)
  bind(uri(concat("https://www.wikidata.org/wiki/",  ?orgId)) as ?linkWD)
  bind(uri(concat("https://www.grid.ac/institutes/", ?GRID))  as ?linkGRID)
  service wikibase:label {bd:serviceParam wikibase:language "en,fr,it,de,nl"}
}
dydra commented 5 years ago
VladimirAlexiev commented 5 years ago

Blazegraph WITH/INCLUDE is local. WITH defines a subquery with fixed execution order. The subquery is repeated every time you run the main query, so the result set is not saved.

I think #44 and #41 are the same. We don't need 2 variants of something we don't have :-)

dydra commented 5 years ago

the scope of the WITH/AS is not obvious. what happens when an INCLUDE with the same name appears in more than one place in the query?

cygri commented 5 years ago

A number of related issues:

So this is all about computing some sort of intermediate result, but varies along several dimensions:

  1. Is the intermediate result a graph or a solution sequence?
  2. Is the intermediate result usable only within the same query that defined it, or is it made available under some name/IRI for use in later queries?
  3. Is the intermediate result virtual or materialised, that is, does it change when the underlying data changes?

The only combination of these dimensions that is currently possible is “store a materialised graph for use in subsequent queries”, by using a named graph and INSERT { ... } WHERE { ... }.

dbooth-boston commented 5 years ago

To clarify, while I agree that this issue is related to others, this one is specifically about solution sets -- not graphs. Solution sets are a very central concept to SPARQL, but at present (in standard SPARQL) there is no way to name, save or refer to them, and this makes it hard to subdivide the work of a complex query into a series of simpler steps. I have not looked at the BlazeGraph WITH / INCLUDE feature, but it was Bryan Thompson of BlazeGraph (formerly BigData) who first suggested this idea a few years ago.

cygri commented 5 years ago

@dbooth-boston So, questions for you:

  1. Is the intermediate result usable only within the same query that defined it, or is it made available under some name/IRI for use in later queries?
  2. Is the intermediate result virtual or materialised, that is, does it change when the underlying data changes?
dbooth-boston commented 5 years ago

@cygri , my understanding was that it would be materialized and only available within the same query. I assume that would be easiest to implement. However, other options are worth considering, and at this point I do not have a strong opinion either way.

cygri commented 5 years ago

@dbooth-boston Got it, thanks. So that's indeed what BlazeGraph WITH/INCLUDE provides.

afs commented 5 years ago

Removing "SPARQL: " on transferred issue.

bergos commented 1 year ago

I wrote a blog post about this topic. My proposal uses a slightly different syntax. You can try it on the POC Web application. See also the comments.