w3c / sparql-query

https://w3c.github.io/sparql-query/
Other
9 stars 0 forks source link

Implicit timezone in comparison and sorting. #116

Open afs opened 1 year ago

afs commented 1 year ago

Issue #86 updates SPARQL to reference F&O version 3.0.

One issue results from that is the use of implicit timezones in comparisons and sorting noted in https://github.com/w3c/sparql-query/issues/86#issuecomment-1566143387 reproduced here:


In the RDF context, where data is on the web and can be drawn from multiple sources, there isn't a natural timezone, nor will it be the same as the request origin. Even a single data source, data collected over time, is affected because of DST.

For SPARQL, it is useful for sorting because it gives a total order.

For comparisons, the implicit timezone is less useful. RDF Concepts refers to XML Schema 1.1. The indeterminate comparison order at least does not give false information. I don't see much use of xsd:dateTimeStamp.

We could choose to say that there is no implicit timezone by default for comparison (i.e. XML Schema rules) and suggest its use for ordering. Maybe also say implementations MAY (RFC 2119) provide an implicit timezone with certain consequences.

We'd need text about this and it is shame not to be able to just refer to F&O but overall I think it's worth it.

If instead we choose to have a timezone, there ought to be only one. +00:00 (c.f. cloud provide server clocks.) for same answers everywhere.

afs commented 10 months ago

The WG discussed this issue during the telecon of 2023-12-14.

https://www.w3.org/2023/12/14-rdf-star-minutes.html#x232

pchampin commented 1 month ago

Coming back to this issue, another idea came to my mind: if I read Section 3.2.7.4 Order relation on dateTime of XML Schema Part 2 correctly, two dateTimes without timezone are compared as if they were in the same timezone (they share the same timeline).

Following the reasoning above, this is not really appropriate for RDF... Two independently produced dateTime values with no timezone may actually have been produced in two different locations. Wouldn't it be better to consider that timezone-less dateTime are never equal (they might still be lower than if the difference is >14h)?

afs commented 1 month ago

That's a reference to XML Schema 1.0 - which is the link in F&O 3.1 in some places. XML Schema 1.1 is different -- 2.2.3 Order -- XML Schema 1.1 -- and also references by F&O.

What is the status of XSD 1.0 and XSD 1.1? SPARQL 1.2 migrated to referencing 1.1 but looking at Functions and Operators 3.1 there are links to 1.0 as well.


Thoughts about idea that timezone-less dateTime are never equal.

SPARQL uses Functions & Operators for comparison and ordering, not XSD.

op:date-less-than is a comparison of the starting instants on the timeline. I read that as meaning it is not the XSD-defined comparison directly (1.0 effectively ignores timezone on xsd:date). It defined as comparison of xsd:dateTime. "The starting instant of an xs:date is the xs:dateTime at time 00:00:00 on that date." That would be timezone sensitive.

This network of specs is complicated. My reading may be wrong.

Conclusion: There is no perfect answer. The second-best option is a defined answer.

TallTed commented 1 month ago

Somewhere between perfect and second-best would be a followup to the other standards and their respective organizations, toward better definitions and/or handling guidance that would benefit all specs that now depend on these incomplete standards.

Tpt commented 1 month ago

If I understand it properly, XPath Function & Operators states a well defined behavior that is:

  1. if there is no timezone use the default one
  2. compute the time on timeline
  3. execute the comparison with the time on timeline

Hence, I am not sure there is much room for improvements in XPath F&O.

If we want to stay close to XPath F&O this leaves us with three options imho (I might miss some others):

  1. Follow it closely and state that SPARQL implementations must use a fallback timezone (likely +00:00) and use time on timeline for comparison
  2. Depart a bit from XPath F&O and state there is no fallback timezone and that, in case of a comparaison between a date/time/datetime with a timezone and one without a timezone, both +14:00 and -14:00 timezones must be used to compute two time on timelines for the timezone-less value, and comparison is defined only if the two time on timelines yields the same results. This is XML schema 1.1 ordering on xsd:dateTimes
  3. An intemediate path where we state that a fallback timezone might be defined eg. for ORDER BY ordering but not for comparison operations
afs commented 1 month ago

Somewhere between perfect and second-best would be a followup to the other standards and their respective organizations, toward better definitions and/or handling guidance that would benefit all specs that now depend on these incomplete standards.

The organisation is W3C.

The specs are not vague - they are (IMO) complicated but they do define something. There is history there.

pietercolpaert commented 2 weeks ago

My interpretation of this:

If there is no timezone set, the literal becomes a time period with the extreme timezones as the bounds of that period. Then, logically:

My proposal would be to allow in SPARQL to select the bounds of something that will be interpreted as a period using special functions that could select the lower or upper bound from the period. This way the SPARQL query writer has full control of how they want to specifically interpret the time literals that will be interpreted as a period in the SPARQL engine.