Aklakan commented 3 years ago

(Apologies for re-opening #127 as a fresh issue, but in my attempts to clarify the initial proposal I got lost in considerations of technical details and corner cases so that by now my feeling is that I turned it into an incomprehensible mess from which it is no longer possible to judge whether the core idea is of interest or not to the community)

Why?

Consider this analogy: OWL as a description logic language is entity centric: A class expression intensionally describes a set of entities satisfying a given set of constraints. This is akin to a SPARQL SELECT query with a single variable. However, in both cases it is not possible to specify in the same query a corresponding RDF graph fragment for these entities.

In contrast, SPARQL construct queries are triple-centric. Yet, while it is possible to specify the RDF graphs to create from retrieval, it is not possible to specify in a standard way to which entities the triples belong.

As a consequence, so far it is not possible to have a SPARQL query that semantically describes a set of 'objects' - i.e. a 'thing with an id' together with an RDF graph fragment that describes it.

SPARQL is not a graph traversal language and this proposal is not about making it one, but having a standard way to designate entities together with their graph fragment would foremost provide a direct connection point for other path-based languages / approaches such as LDPath or LDFlex.

An example, consider this use case: "From the SPARQL endpoint of scholarlydata retrieve the first 100 publications together with all authors ordered by the name of the first author".

At present, to the best of my knowledge, the query would have to look like this:

PREFIX  eg:   <http://www.example.org/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dct: <http://purl.org/dc/terms/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  conf: <https://w3id.org/scholarlydata/ontology/conference-ontology.owl#>
PREFIX  bibo: <http://purl.org/ontology/bibo/>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>

CONSTRUCT {
  ?pub
    rdfs:label ?label ;
    dct:creator ?content ;
    eg:sortKey ?firstAuthorName .

  ?content foaf:name ?name .
} {
SELECT DISTINCT  ?pub ?label ?list ?firstAuthorName ?content ?name
WHERE
  { { SELECT  ?pub (MIN(str(?firstAuthorName)) AS ?sortKey_1)
      WHERE
        { ?pub  rdf:type         conf:InProceedings ;
                rdfs:label       ?label ;
                bibo:authorList  ?list .
          ?list (conf:hasFirstItem/conf:hasContent)/foaf:name ?firstAuthorName .
          ?list conf:hasItem/conf:hasContent ?content .
          ?content  foaf:name  ?name
        }
      GROUP BY ?pub
      ORDER BY ASC(MIN(str(?firstAuthorName)))
      OFFSET  50
      LIMIT   100
    }
    ?pub  rdf:type         conf:InProceedings ;
          rdfs:label       ?label ;
          bibo:authorList  ?list .
    ?list (conf:hasFirstItem/conf:hasContent)/foaf:name ?firstAuthorName .
    ?list conf:hasItem/conf:hasContent ?content .
    ?content  foaf:name  ?name
  }
ORDER BY ASC(?sortKey_1) ?pub
}

with the response

ns1:iswc-2019-demo-550  ns4:sortKey "Ahmad Sakor" .
ns1:iswc-2019-doctoral-419  rdfs:label  "Fine-grained Entity Type Inference in RDF Knowledge Graphs" ;
    ns2:creator ns3:a-b-m-moniruzzaman ;
    ns4:sortKey "A B M Moniruzzaman" .
ns1:iswc-2019-poster-479    rdfs:label  "An Overview of the TBFY Knowledge Graph for Public Procurement" ;
    ns2:creator ns3:philip-turk ,
        ns3:oscar-corcho ,
        ns3:dumitru-roman ,
        ns3:elena-simperl ,
        ns3:ahmet-soylu ,
        ns3:chris-taggart ,
        ns3:ian-makgill ,
        ns3:till-c-lech ;
    ns4:sortKey "Ahmet Soylu" .
ns1:iswc-2019-research-208  rdfs:label  "Incorporating Literals into Knowledge Graph Embeddings" ;
    ns2:creator ns3:mohammad-asif-khan ,

The response now is just a bunch of triples. Let's assume an application should display an html template <span>{{resource.label}}</span> - which resource should it match? The query response does not tell it where to start. Of course the application could just pick any resource with a label - but what if authors and publications both have them? So we would add a special type to CONSTRUCT query so that the application can pick it up. Now the application that should just display a title of something passed to it needs to be aware of ontology metadata. (And of course, the application by the consortium member uses the same approach using a different class)

Previous work

Custom solutions involving query transformations, use of vocabularies to annotate resources in the construct template, and post processing of SPARQL query responses.

Proposed solution

The proposal comprises three things:

a succinct short hand for partitioning the result of a CONSTRUCT query that retains orders. This is based on transformation to the query above with the nested select
a corresponding quad-based result format
adding a feature that allows designating the set of of relevant entities within a partition in the response

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX conf: <https://w3id.org/scholarlydata/ontology/conference-ontology.owl#>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX eg: <http://www.example.org/>

ENTITY ?pub
CONSTRUCT {
  ?pub
    rdfs:label ?label ;
    dct:creator ?content ;
    eg:sortKey ?firstAuthorName .

  ?content foaf:name ?name .
}
WHERE {
  ?pub
    a conf:InProceedings ;
    rdfs:label ?label ;
    bibo:authorList ?list .

  ?list
    conf:hasFirstItem/conf:hasContent/foaf:name ?firstAuthorName ;
    conf:hasItem/conf:hasContent ?content .

  ?content foaf:name ?name .
}
PARTITION BY ?pub
ORDER PARTITIONS BY ASC(MIN(?firstAuthorName))
LIMIT 100
OFFSET 50

The response for this query is a sequence of partitions in the order as specified in the query. The partitions could be represented as named graphs with random-generated IRIs on query execution and thus most unlikely to clash with any data in the payload. A to-be-standardized property attached to this named graph IRI could be used to state which entities within that graph were declared to act as starting points. The specification could guarantee that partitions are always exposed as consecutive quads. A change in the named graph IRI thus marks the end of a partition. As a result, path based approaches could directly 'connect' to the designated entities and traverse the data in of the partition's graph fragments.

<urn:sparql-partition:58ryRBAb4Wh92LLn-4TLwyTTVaNvMaCBzA4aLUvLlk4=-0> {
    <urn:sparql-partition:58ryRBAb4Wh92LLn-4TLwyTTVaNvMaCBzA4aLUvLlk4=-0>
            <http://NEEDS_STANDARDIZATION/hasEntity>  <https://w3id.org/scholarlydata/inproceedings/eswc2009/paper/181> .
    <https://w3id.org/scholarlydata/person/philippe-cudre-mauroux>
            <http://xmlns.com/foaf/0.1/name>  "Philippe Cudre Mauroux" ;
            <http://xmlns.com/foaf/0.1/name>  "Philippe Cudré-Mauroux" ;
            <http://xmlns.com/foaf/0.1/name>  "Philippe Cudre-Mauroux" .
    <https://w3id.org/scholarlydata/person/sebastian-michel>
            <http://xmlns.com/foaf/0.1/name>  "Sebastian Michel" .
    <https://w3id.org/scholarlydata/inproceedings/eswc2009/paper/181>
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/sebastian-michel> ;
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/adriana-budura> ;
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/philippe-cudre-mauroux> ;
            <http://www.example.org/sortKey>  "Adriana Budura" ;
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/karl-aberer> ;
            <http://www.w3.org/2000/01/rdf-schema#label>  "Neighborhood - based Tag Prediction" .
    <https://w3id.org/scholarlydata/person/adriana-budura>
            <http://xmlns.com/foaf/0.1/name>  "Adriana Budura" .
    <https://w3id.org/scholarlydata/person/karl-aberer>
            <http://xmlns.com/foaf/0.1/name>  "Karl Aberer" .
}

...

<urn:sparql-partition:58ryRBAb4Wh92LLn-4TLwyTTVaNvMaCBzA4aLUvLlk4=-99> {
    <urn:sparql-partition:58ryRBAb4Wh92LLn-4TLwyTTVaNvMaCBzA4aLUvLlk4=-99>
            <http://NEEDS_STANDARDIZATION/hasEntity>  <https://w3id.org/scholarlydata/inproceedings/iswc2002/proceedings/paper-28> .
    <https://w3id.org/scholarlydata/person/gerhard-friedrich>
            <http://xmlns.com/foaf/0.1/name>  "Gerhard Friedrich" .
    <https://w3id.org/scholarlydata/inproceedings/iswc2002/proceedings/paper-28>
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/markus-zanker> ;
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/alexander-felfernig> ;
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/gerhard-friedrich> ;
            <http://www.example.org/sortKey>  "Alexander Felfernig" ;
            <http://purl.org/dc/terms/creator>  <https://w3id.org/scholarlydata/person/dietmar-jannach> ;
            <http://www.w3.org/2000/01/rdf-schema#label>  "Semantic Configuration Web Services in the CAWICOMS Project" .
    <https://w3id.org/scholarlydata/person/alexander-felfernig>
            <http://xmlns.com/foaf/0.1/name>  "Alexander Felfernig" .
    <https://w3id.org/scholarlydata/person/markus-zanker>
            <http://xmlns.com/foaf/0.1/name>  "Markus Zanker" .
    <https://w3id.org/scholarlydata/person/dietmar-jannach>
            <http://xmlns.com/foaf/0.1/name>  "Dietmar Jannach" .
}

Considerations for backward compatibility

None

edgardmarx commented 3 years ago

I will go a bit further and simplify a bit by removing the partititions and Entity and leaving a "graph" keyword after construct. E.g.

CONSTRUCT GRAPH {
  ?pub
    rdfs:label ?label ;
    dct:creator ?content ;
    eg:sortKey ?firstAuthorName .

  ?content foaf:name ?name .
}
WHERE {
  ?pub
    a conf:InProceedings ;
    rdfs:label ?label ;
    bibo:authorList ?list .

  ?list
    conf:hasFirstItem/conf:hasContent/foaf:name ?firstAuthorName ;
    conf:hasItem/conf:hasContent ?content .

  ?content foaf:name ?name .
}
ORDER BY ASC(MIN(?firstAuthorName))
LIMIT 100
OFFSET 50

Notice that this allows you to build any graph in the construct, not just entities. Further, you are specifying that the construct is a graph instead of triple centric. Finally, you do not need to specify the "start" of the graph (partition) as you can extract the same subgraphs starting by any of the variables in the construct.

Aklakan commented 3 years ago

Finally, you do not need to specify the "start" of the graph (partition) as you can extract the same subgraphs starting by any of the variables in the construct.

Hi @edgardmarx,

I am afraid your suggestion misses the point. (Maybe your example is incomplete?)

Your query has an aggregate function but there is no specification on what the group key is, and what the set of values is that is being aggregated. In the scholarly data, authors actually have multiple names - so when you want to sort the partitions by the name of the first author, you need to aggregate it into a single sort key (e.g. using MIN - i.e. use the lexicographically smallest name).
Your LIMIT and OFFSET clauses seems to just pick 100 bindings - but there is no indication how it would relate to 100 entities.

My proposal contains two aspects:

Specifying on what variables to partition the bindings of a a result set in order to enable construction of graphs for each partition. This can be seen as GROUP BY with an aggregation function that builds triples
A way to designate the entities matched by the construct template

Conceptionally, these aspects are independent but actually they could be conflated to make things simpler. Instead of specifying by which variables to partition, the partition variables could implicitly be set to the single entity variable:

ENTITY ?y
CONSTRUCT { ?y a :Publication }
WHERE {
  ?y a :BibliographicResource ;
     :firstAuthorName ?fn
}
ORDER ENTITIES BY ASC(MIN(?fn)) 
OFFSET 50
LIMIT  100

This would translate to

CONSTRUCT { ?entity  a :Publication }
{
  { SELECT DISTINCT  ?entity {

    # Inner select to have slicing (i.e. limit/offset) work on the level of the entity keys
    { SELECT  ?entity (MIN(?fn) AS ?sortKey_1) {
        ?entity a :BibliographicResource ; :firstAuthorName ?fn
      } GROUP BY ?entity ORDER BY ASC(MIN(?fn)) OFFSET  50 LIMIT  100 }

    # Outer select to match the attributes
    ?entity a :BibliographicResource ; :firstAuthorName ?fn
  } ORDER BY ASC(?sortKey_1) ?entity
}

If there was no slicing and ordering, the inner select could be omitted and in this example the query would become

SELECT DISTINCT ?entity {
  ?entity a :BibliographicResource ; :firstAuthorName ?fn
} ORDER BY ?entity

namedgraph commented 3 years ago

@Aklakan I don't understand what's wrong with your current-SPARQL example with sub-SELECT? I've used this pattern many times and it worked fine.

What you say about <span>{{resource.label}}</span> is not an RDF problem, it's a presentation problem. The software layer has to be RDF-aware, not RDF has to be tailored to allow "entities" -- if it's not triples, it's not RDF. We have an RDF-aware presentation layer that is implemented in XSLT and works perfectly fine: https://github.com/AtomGraph/LinkedDataHub/tree/master/src/main/webapp/static/com/atomgraph/linkeddatahub/xsl

resource.label is a resource-level expression, and you for some reason want to apply it to a whole graph. If you first use an expression to select the resource, then the problem goes away. E.g. smth like:


graph.resources.filter(r.label).forEach(function() {
    ...
    <span>{{resource.label}}</span>
    ...
}, this);

edgardmarx commented 3 years ago

Dear @namedgraph, thanks for your interest. The problem that @Aklakan is trying to overcome is a bit more complex. See my explanation in https://github.com/w3c/sparql-12/issues/127#issuecomment-715479847 . Further, there is no guarantee that a query working in one triple store will work in another because that's totally dependent on the order that the triples were indexed. The problem is not related to the serialization format.

@Aklakan, you are totally right, my example missed the group by variable in which I will suggest to use the SPARQL syntax itself. Unless I am missing something, I think your example could be written as follows in my suggested syntax:

CONSTRUCT GRAPH {
  ?pub
    rdfs:label ?label ;
    dct:creator ?content ;
    eg:sortKey ?firstAuthorName .

  ?content foaf:name ?name .
}
WHERE {
  ?pub
    a conf:InProceedings ;
    rdfs:label ?label ;
    bibo:authorList ?list .

  ?list
    conf:hasFirstItem/conf:hasContent/foaf:name ?firstAuthorName ;
    conf:hasItem/conf:hasContent ?content .

  ?content foaf:name ?name .
}
**GROUP BY ?pub**
ORDER BY ASC(MIN(?firstAuthorName))
LIMIT 100
OFFSET 50

namedgraph commented 3 years ago

Sub-SELECT specifies the ordering? It needs ORDER BY to provide a stable ordering. I'm not sure why @Aklakan needed nested SELECTs though. The pattern that works for us is roughly:

CONSTRUCT
{
    ?resource ?property ?value .
}
{
    {
        SELECT *
        {
            ?resource rdfs:label ?label # get labelled resources
        }
        ORDER BY ?label
        OFFSET 0
        LIMIT 20
    }
    ?resource ?property ?value . # get the rest of their triples
}

If you turn it into a DESCRIBE, it becomes even shorter.

There's no way getting around the fact the RDF graph result you get is an unordered set of triples. So your presentation layer has to do a secondary sort regardless of the ORDER BY.

edgardmarx commented 3 years ago

Hey @namedgraph,

Thanks again for engaging in the discussion, you got the idea.

"There's no way getting around the fact the RDF graph result you get is an unordered set of triples"

Your example is already complicated with one single triple pattern, imagine building one with two or three linked by different variables.

That's the issue @Aklakan thinks should be simplified in the SPARQL Contruct syntax, and I totally agree.

namedgraph commented 3 years ago

My view is that SPARQL 1.2 should prioritize cases/features that are currently not even possible. This is not one of them.

Aklakan commented 3 years ago

There's no way getting around the fact the RDF graph result you get is an unordered set of triples.

Hi all, the fundamental question is whether SPARQL should have support to make it easier to work on the entity level. I'd clearly like to see quality of life improvements in this regard. You said yourself you used the pattern as well - which means you also wrote this query transformations in your client (because that's what everyone needs to before you can do graph.resources.filter(r.label)... [edit:] In my example in the initial post the named graphs that correspond to the partitions have an ordering but the triples within the named graphs are unordered.

My view is that SPARQL 1.2 should prioritize cases/features that are currently not even possible. This is not one of them.

Your comment just popped up - so yes, there is a fundamental difference in the view - for me a minor version increase does not necessarily have to provide great new features but could provide some quality of life improvements.

edgardmarx commented 3 years ago

Dear @Aklakan and @namedgraph I see this issue such as the 39 below.

https://github.com/w3c/sparql-12/issues/39

In short: Could you do it using simple SPARQL? Yes. Does it improve the syntax? Definitely.

Aklakan commented 3 years ago

@edgardmarx Hm, I'd say Point 3 in #39 recognizes the problem:

RDF has just triples; how to delineate "circumscribe" a business object is non-trivial.

Not sure what you mean by

'Could you do it using simple SPARQL? Yes'

What is 'it'?:

39 is about outsourcing the problem of building entities to other means via a `DESCRIBE ?x AS some:procedure`.

The core idea of my proposal is about having a native mechanism in SPARQL to build entities - based on aggregation of solution bindings to conceptually (RDF term, RDF Graph) pairs exposed using named graphs in a specified order.

So it might be possible to transform a Shacl or Shex shape to an entity-based SPARQL query that yields the set of resources and corresponding triples that match the shape specification.

edgardmarx commented 3 years ago

@Aklakan I just meant to say that you could overcome the problem of the issue https://github.com/w3c/sparql-12/issues/39 with a Select query.

Aklakan commented 3 years ago

What you say about {{resource.label}} is not an RDF problem, it's a presentation problem. ` [..] graph.resources.filter(r.label).forEach(function() { [..]

It's not a presentation problem but a set theoretic problem:

graph.resources.filter(r.label) is nothing else than SELECT DISTINCT ?s WHERE { ?s rdfs:label [] } on the RDF graph that was supplied to your application by perhaps another SPARQL query. Instead of having a single sparql query on the supplier side that specifies the set of (RDF term, graph fragment) pairs for the consumer to operate on, right now we need a second specification of the set of resources on the consumer side that has to be in sync with the supplier. If in the example data the authors and publications both had labels then the consumer needs to repeat the pattern for selecting specifically the publications - something that the supplier knew all along - but the supplier cannot communicate that in a standard way. That's the core of the problem.

[edit: I am assuming an architecture where the view is 'dumb' - it just works on the resources supplied to it; e.g. validation of the involved data is an orthogonal concern]

namedgraph commented 3 years ago

So why don't you build your UI on the SELECT result table?

You can get a graph (CONSTRUCT/DESCRIBE result) from a projection (SELECT result), but not the way around. I'm sure you know this.

It seems that you want have your cake (graph) and eat it (treat it as a table) too. Since you cannot do that, you need a secondary projection in the UI layer.

VladimirAlexiev commented 3 years ago

also related to #48

afs commented 3 years ago

Related: #86 -- CONSTRUCT DISTINCT and REDUCED Related: #31 -- CONSTRUCT GRAPH

TallTed commented 3 years ago

Related: #33 -- SELECT ... FROM CONSTRUCT ...

Aklakan commented 3 years ago

Related: #33 -- SELECT ... FROM CONSTRUCT ...

33 is about rewriting SPARQL SELECT queries over views similar to the SPARQL-to-SQL. The difference is that in #33 a set of CONSTRUCT queries takes the role of the view definitions (for which there is R2RML for the SQL world)

This is an orthogonal feature to those related to somehow shaping data objects. A data object is ID + state. In RDF this translates to a resource plus a graph fragment. General objects have behavior in addition.

justin2004 commented 3 years ago

a corresponding quad-based result format

http://www.scholarlydata.org/sparql/ might not support trig and n-quads but if it did you could construct quads:

CONSTRUCT {
graph ?pub {
  ?pub
    rdfs:label ?label ;
    dct:creator ?content ;
    eg:sortKey ?firstAuthorName .

  ?content foaf:name ?name .
  }
}
...

you can do that with jena.

@Aklakan does that help?

afs commented 3 years ago

w3c / sparql-dev

Entity-based Construct Queries #128

Why?

Previous work

Proposed solution

Considerations for backward compatibility

39 is about outsourcing the problem of building entities to other means via a `DESCRIBE ?x AS some:procedure`.

33 is about rewriting SPARQL SELECT queries over views similar to the SPARQL-to-SQL. The difference is that in #33 a set of CONSTRUCT queries takes the role of the view definitions (for which there is R2RML for the SQL world)

w3c / sparql-dev

Entity-based Construct Queries #128

Why?

Previous work

Proposed solution

Considerations for backward compatibility

39 is about outsourcing the problem of building entities to other means via a DESCRIBE ?x AS some:procedure.

33 is about rewriting SPARQL SELECT queries over views similar to the SPARQL-to-SQL. The difference is that in #33 a set of CONSTRUCT queries takes the role of the view definitions (for which there is R2RML for the SQL world)

39 is about outsourcing the problem of building entities to other means via a `DESCRIBE ?x AS some:procedure`.