w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
121 stars 19 forks source link

SPARQL-friendly lists #46

Open dbooth-boston opened 5 years ago

dbooth-boston commented 5 years ago

It is very hard[7] to query RDF lists, using standard SPARQL, while returning item ordering. This inability to conveniently handle such a basic data construct seems brain-dead to developers who have grown to take lists for granted.

"On my wish list are . . . generic structures like nested lists as first class citizens" https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0170.html

IDEA: Jena's list:index property

Apache Jena offers one potential (though non-standard) way to ease this pain, by defining a list:index property: https://jena.apache.org/documentation/query/rdf_lists.html

IDEA: Add lists as a fundamental concept in RDF

As proposed by David Wood and James Leigh prior to the RDF 1.1 work.[8] https://www.w3.org/2009/12/rdf-ws/papers/ws14

william-vw commented 5 years ago

+1M. See also here (issue 3): http://manu.sporny.org/2014/json-ld-origins-2/ ...

Note that it could be straightforward to add extra semantics, i.e., on top of a triple-based representation, to implement these kinds of list predicates.

VladimirAlexiev commented 5 years ago

+1 . cc @azaroth42

RickMoynihan commented 5 years ago

SHACL also makes use of lists to express paths, so improving list support might make SHACL processing easier too.

jaw111 commented 5 years ago

Would the VALUES OF syntax proposed by @cygri in #6 be appropriate here?

Example:

VALUES (?item ?idx) OF splitList(("travel" "iceland" "winter"))

The returned results are equivalent to:

VALUES (?item ?idx) {
    ("travel" 1)
    ("iceland" 2)
    ("winter" 3)
}
cygri commented 5 years ago

@jaw111 I don't quite understand the syntax you're using here. The proposal for VALUES OF only allows normal SPARQL expressions as arguments of the multi-value function, so a list wouldn't be allowed there.

Is the intention to use it like splitList(?x) where ?x would have been earlier bound to the first blank node of a list in the active graph? So, data:

<articles/1234> ex:tagList ("travel" "iceland" "winter").

And query:

SELECT ?tag ?idx {
    <articles/1234> ex:tagList ?tags
    VALUES (?tag ?idx) OF listMembers(?tags)
}

With the result you gave. This would cover the functionality provided by Jena's list:member and list:index property functions.

tayloj commented 5 years ago

Some discussion on the mailing list about length-bounded property paths seems relevant too, since a path like ?list rdf:rest{n}/rdf:first ?item returns the nth element of a list (with zero-based indexing).

jaw111 commented 5 years ago

@cygri you are correct, using a list there does not make much sense. Must have missed a trick earlier.

@tayloj expanding on your suggestion, how about using a variable instead of an integer for the path length? So a path like ?list rdf:rest{?n}/rdf:first ?item returns a set of solutions where the ?n variable is bound to the index.

tayloj commented 5 years ago

@jaw111 That'd certainly be useful, but I have no idea how feasible it is. I think there were already difficulties in implementing {n,m} quantifiers efficiently even with fixed values. Moving to a variable is probably even more complicated. But I'd definitely use it if it were available.

TallTed commented 5 years ago

I think there were already difficulties in implementing {n,m} quantifiers efficiently even with fixed values.

FWIW, Virtuoso still supports the {n,m} property path quantifiers. (This is not a comment on the rdf:rest{?n} suggestion from @jaw111.)

kasei commented 5 years ago

@TallTed does Virtuoso use the bag semantics of expanding that to a BGP/union equivalent, or the set semantics of just limiting the length of a + path?

jaw111 commented 5 years ago

I know @dydra also supports that syntax, including {n,} and {,m} cases.

-------- Original message -------- From: Gregory Todd Williams notifications@github.com Date: 01/05/2019 18:34 (GMT+02:00) To: w3c/sparql-12 sparql-12@noreply.github.com Cc: John Walker johnawalker@hotmail.com, Mention mention@noreply.github.com Subject: Re: [w3c/sparql-12] SPARQL-friendly lists (#46)

@TallTedhttps://github.com/TallTed does Virtuoso use the bag semantics of expanding that to a BGP/union equivalent, or the set semantics of just limiting the length of a + path?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/w3c/sparql-12/issues/46#issuecomment-488316830, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAS5B7PWDVETIYIACBJCXALPTGZ6RANCNFSM4HDNJYRA.

TallTed commented 5 years ago

does Virtuoso use the bag semantics of expanding [the {n,m} property path quantifiers] to a BGP/union equivalent, or the set semantics of just limiting the length of a + path?

@kasei - Good question, to which I don't immediately have the answer. @IvanMikhailov or @kidehen may be able to shed some light.

kidehen commented 5 years ago

@kasei ,

Are we talking about what's exemplified by the following query?

SELECT DISTINCT  * 
WHERE { 
        ?s a <http://dbpedia.org/ontology/AcademicJournal> ; 
        rdf:type{1,3} ?o 
       } 

LIMIT 50

Live Results Link.

/cc @TallTed

kasei commented 5 years ago

@kidehen Yes, except for the DISTINCT which will mask the difference. It seems that it's using the bag semantics of BGP/union expansion, which can have some challenges with cardinality for larger values of the path quantifiers (and as I recall was one of the big issues that prevented this from being included in SPARQL 1.1).

kidehen commented 5 years ago

@kidehen Yes, except for the DISTINCT which will mask the difference. It seems that it's using the bag semantics of BGP/union expansion, which can have some challenges with cardinality for larger values of the path quantifiers (and as I recall was one of the big issues that prevented this from being included in SPARQL 1.1).

Okay, here's the query solution link without DISTINCT :)

ktk commented 4 years ago

We use sh:in for validation of data cubes in RDF. Unfortunately it is pretty much impossible to generate such a list in SPARQL, at least I could not figure out how.

The list functions in Jena seem to be accessing lists only, not manipulating or creating them. Is there any prior work somewhere about how creating and manipulating could look like?

I have not much know how about designing such things but what I tried doing (and failed) was:

CONSTRUCT {
  <something> sh:in ( ?listMembers ) .
}

So pretty much using the Turtle collection syntax. So ?listMembers could be a normal set, if we use SELECT subquery it could also be ordered before using it in the CONSTRUCT. Also I would imagine that I can add more variables, like I can add more entries in Turtle syntax.

Am I completely missing something here that prevents this approach from working?

There is obviously more missing, like removing an entry and adding a new entry but I'm not sure how much of it is realistic in a language like SPARQL.

By the way why is this called collection in Turtle and not list?

ktk commented 4 years ago

Bob DuCharme had a blog post that showed some standard manipulations. Works to some extend but is not really nice form syntactic sugar point of view http://www.snee.com/bobdc.blog/2014/04/rdf-lists-and-sparql.html

ktk commented 4 years ago

Some tries by @jaw111 https://gist.github.com/jaw111/1b149fd1111f774a3613f10955686617 via Twitter

afs commented 4 years ago

From some time ago: https://afs.github.io/rdf-lists-sparql . Lesson - it's painful.

One avenue is to add to the basic SPARQL data model - lists and sets (and paths) - beyond RDF terms. This is a large change, including result set formats, but I think it is worth exploring.

ericprud commented 4 years ago

In SWObjects, I extended triple pattern matching with some generators. One of those was "MEMBERS(?var)" (example use) which joined the current binding with the argument (?var above) bound to each member of the list.

I mentioned it to Lee F during the SPARQL 1.1 WG and he said the syntax give him hives. I used this a lot, especially from the command line to e.g. sequentially walk test manifest entries, with no skin conditions that couldn't be explained by prolonged puberty.

TallTed commented 4 years ago

@ktk

By the way why is this called collection in Turtle and not list?

List is most commonly understood to mean ordered list, while collection is most commonly understood to mean unordered list. (Yes, both list and collection may have both ordered and unordered variants, but the most common intuitive default is as I said.) Unordered membership is far easier to handle due to various other aspects of RDF and DBMS, and for many reasons (not least being WG time constraints) that ease was important in the development of these specs.

JervenBolleman commented 4 years ago

Stardog talked about a possible extension at least to have a list equivalent to group_concat which would affect the result formats more than anything else.

ktk commented 4 years ago

@JervenBolleman that is an interesting one, thanks. We have a workaround where we create lists in a coded step after concatenating them with GROUP_CONCAT in SPARQL so that feels very natural to me. Some questions based on that proposal:

albertmeronyo commented 4 years ago

As per @ktk 's suggestion I'm linking here the slides I used today at ISWC to talk about our work on RDF Lists: https://www.slideshare.net/albertmeronyo/modelling-and-querying-lists-in-rdf-a-pragmatic-study

I went into the presentation unaware of this thread :-) So I just subscribed cc/ @enridaga

ktk commented 2 years ago

Just noticed that Stardog provides nice basic member functions for lists, I like what I see https://docs.stardog.com/query-stardog/#rdf-list-functions

ericprud commented 2 years ago

Just noticed that Stardog provides nice basic member functions for lists, I like what I see https://docs.stardog.com/query-stardog/#rdf-list-functions

It seems to me that if you have the freedom to extend SPARQL, there are good reasons to write these as operators in the query language rather than as magic predicates embedded in triple patterns:

  1. leverage syntax and function composition, e.g. BIND (LENGTH(:literalList) AS ?length) instead of :literalList stardog:list:length ?length. The former can be combined with any other function available in SPARQL 1.2.
  2. separate SPARQL operations from asserted triples. The magic triple representation is shorter, but it can be easily missed when nestled in with a bunch of triple constraints which correspond to asserted triples. In addition to aiding human recognition, it will be easier to verify completeness of query re-writers (e.g. SPARQL to SQL) if these operations have their own syntactic constructs.
  3. reject unsupported queries. A SPARQL 1.1 engine will reject a query with a LENGTH operator while it would silently fail to match a query with a stardog:list:length predicate.

One advantage to magic predicates is that such a query can pass seamlessly through a naive SPARQL pipeline processor (e.g. a tool which parses the query for bound variables, issues it verbatim, and renders the results in a nice HTML table). Unless SPARQL 1.2 were committed to being syntactically compatible with SPARQL 1.1, I don't think syntactic compatibility of list features compensates for the advantages of SPARQL list operators.

namedgraph commented 2 years ago

Pat Hayes on first-class list semantics (or the lack of it):

https://lists.w3.org/Archives/Public/semantic-web/2022Sep/0001.html

VladimirAlexiev commented 1 year ago

Use case: convert SHACL prop attachmetns to domain/range.

Very easy to do for schema:domainIncludes, schema:rangeIncludes because these are polymorphic (multivalued):

insert {
  ?prop schema:domainIncludes ?domain; schema:rangeIncludes ?range
} where {
  {[a sh:NodeShape; sh:property/sh:path ?prop; sh:targetClass ?domain]} union
  {[a sh:PropertyShape; sh:path ?prop; sh:class|sh:datatype ?range]} 
}

Much harder to do for RDFS+OWL because one needs to construct lists, eg

:propP   rdfs:domain         [a owl:Class; owl:unionOf (:classX :classY :classZ)].

@jaw111's example https://gist.github.com/jaw111/1b149fd1111f774a3613f10955686617 shows how to do a similar thing (but produces SHACL as final result, and I think it's a bit erroneous).