Open kasei opened 5 years ago
i call attention to the "values bound to ?string", which are the sort key for the order operation in the example, which demonstrate that, "in the general case, [] given equal values, changes to the ORDER BY operator will not resolve the problem at issue."
@lisp - Again, YES, variables in the SELECT
which are not in the ORDER BY
will not be ordered, will not affect the order of the solution set. This is known, and clear, and I believe this to be a different concern than this issue.
FWIW, here's how GraphDB (and I presume rdf4j) do ordering.
Here are some details about how scalars are ordered. These details come
from the GraphDB SPARQL query processor and you can check them with a
query like this (try also to change the direction to DESC()
)
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix my: <http://example.org/>
base <http://example.org/>
select * {
values ?x {
"0001-01-01T00:00:00"^^xsd:dateTime "0001-01-01"^^xsd:date
"z" "1" "2"
"z"@en "z"@en-GB "z"@fr "1"@en "1"@en-GB "1"@fr
undef
002 2 002.000 2.0 "002.000"^^xsd:float "2.0"^^xsd:float
001 1 001.000 1.0 "001.000"^^xsd:float "1.0"^^xsd:float
"1"^^my:foo "1"^^my:bar "2"^^my:baz
<foo> <http://example.org/bar> my:baz <mailto:foo@example.org> <geo:42.68,23.21> <urn:uuid:1234> <urn:isbn:4567>
}
} order by ASC(?x)
Values are grouped by kind. The ordering of these groups is as follows
in the ASC
(ascending) direction:
Null (undef): when the ordering field is null for some objects
IRIs, ordered alphabetically (prefixed and relative IRIs are expanded), eg:
<geo:42.68,23.21> http://example.org/bar http://example.org/baz
http://example.org/foo mailto:foo@example.org
Numeric values, ordered numerically. (Note:
the shortcut literals 002
and 002.000
mean "002"^^xsd:integer
and "002.000"^^xsd:decimal
respectively):
"001"^^xsd:integer "1"^^xsd:integer "001.000"^^xsd:decimal "1.0"^^xsd:decimal "001.000"^^xsd:float "1.0"^^xsd:float
"002"^^xsd:integer "2"^^xsd:integer "002.000"^^xsd:decimal "2.0"^^xsd:decimal "002.000"^^xsd:float "2.0"^^xsd:float
Dates, ordered chronologically:
"000001-01-01"^^xsd:date "0001-01-01"^^xsd:date "0001-01-02"^^xsd:date
Datetimes, ordered chronologically (please note these are not comparable to dates):
"0001-01-01T00:00:00"^^xsd:dateTime "000001-01-01T00:00:00"^^xsd:dateTime
Datatyped literals other than numbers, dates and datetimes, ordered first by datatype then by value:
"1"^^my:bar "2"^^my:baz "1"^^my:foo
langStrings
, ordered first by language then by value:
"1"@en "z"@en "1"@en-GB "z"@en-GB "1"@fr "z"@fr
Plain strings:
"1"^^xsd:string "2"^^xsd:string "z"^^xsd:string
The order is stable, i.e. equal values of the same kind are emitted in the same order as encountered.
If you use DESC
(descending) the order is reversed, so eg nulls come
last. But stability is preserved, which means that ASC
and DESC
are
not complete inverses of each other.
Why?
There are several cases where the current spec does not provide a total ordering over RDF terms, and therefore causes challenges for accessing data predictably (e.g. when paging results with LIMIT+OFFSET). SPARQL 1.1 §15.1 says, in part:
The second point here is especially interesting, as it means that it is difficult to portably work with any RDF data that heavily uses language-tagged literals.
Previous work
Many implementations already seem to produce a consistent ordering over data for which SPARQL ordering is undefined.
Proposed solution
I believe that the SPARQL spec should add text stating that
ORDER BY
over values with a (currently) undefined order SHOULD cause results to have consistent ordering, even if that order is not explicitly defined by SPARQL. This will allow clients to useLIMIT
/OFFSET
paging over such data. This might also be paired with a Service Description Feature indicating support for such consistent sorting.Possible (partial) alternatives include:
fn:compare
on which SPARQL ordering depends, and they are identified by URIs)ORDER BY CONSISTENT ?name
) without requiring any particular orderingConsiderations for backward compatibility
This is a suggestion to include SHOULD normative language about ordering data in cases where currently no ordering is defined. This should not have any effect on backwards compatibility.