kstenerud / concise-encoding

The secure data format for a modern world
https://concise-encoding.org
Other
258 stars 4 forks source link

Feature request: time zone by offset #13

Open kengruven opened 3 years ago

kengruven commented 3 years ago

CE timestamps can have a time zone, which is currently defined as either "Area/Location" (political) or "Global Coordinates" (lat/lon). There's also special values "Zero" (UTC) and "Local" ("to be interpreted as if in the time zone of the observer").

There doesn't seem to be any way to declare a time zone by offset from UTC, alone, which is how nearly all the timestamps that I encounter are given.

For example, W3C's note on Date and Time Formats gives the sample timestamp "1994-11-05T08:15:30-05:00". I don't think there's any way to encode this in CE.

All other serialization formats in the CE format comparison table which support time zones (ASN.1, CBOR) support offsets.

kengruven commented 3 years ago

On a related note, the "Global Coordinates" time zone system seems useless:

What moment in time is "2021-11-7/01:30/40.71/-74.01"? It could be NYC, 30 minutes before the DST switch, or an hour later, 30 minutes after it.

kstenerud commented 3 years ago

Offsets are supported using area Etc, which is a special area in the IANA timezone database. So for example 1994-11-05/08:15:30/Etc/GMT-5 will give you the same result as 1994-11-05T08:15:30-05:00.

I've never been in a situation where my data had time zone by coordinates

True, it's not common for small companies, but large companies like Google do use coordinate based timezones because there are areas on the planet that don't have official time zones at various points in time, or need to have official time zones retroactively applied.

What moment in time is "2021-11-7/01:30/40.71/-74.01"? It could be NYC, 30 minutes before the DST switch, or an hour later, 30 minutes after it.

The same could be said of 2021-11-7/01:30/America/New_York. When the clocks get set back for DST, you end up with the same hour occurring twice. These are issues for the operating environment to sort out; CE only records the information. Time is a hard problem.

kengruven commented 3 years ago

Hmm, OK. I'd suggest including an example of an offset in the documentation, because that's a very common case. I spent a while digging through the "Structural Spec" and "Compact Time" and I had no idea it was possible. (I am, clearly, not a time zone expert.)

Also, this is still only a spec so far, not implementation, right? This doesn't seem to work yet:

$ echo 'c1 1994-11-05/08:15:30/Etc/GMT' | ./enctool validate
$ echo 'c1 1994-11-05/08:15:30/Etc/GMT-5' | ./enctool validate
offset 31 (line 1, col 32): unexpected [5] while decoding date

And to clarify, "Etc/GMT-5" is just a hardcoded name string in the TZ database, not a syntax for offsets, correct? So if someone from Mumbai sent me the ISO-8601 timestamp "1994-11-05T08:15:30+05:30", there's no "Etc/GMT+5:30" name that could be used in CE for that.

I don't have any specific use case today, so this isn't a big deal to me yet. I'm just not sure how I'm going to implement it. It seems like I'll have to end up doing something like:

if nativeTimestamp.timezone.abbreviation.endsWith("00") {
    conciseDoc.append(nativeTimestamp)
} else {
    conciseDoc.append(nativeTimestamp.convertTo(UTC))
}
kstenerud commented 3 years ago

Hmm looks like the parser is incorrectly tossing out the - character. All it does is collect the timezone string and then pass it to your operating system / runtime library's time zone database to get a time zone object, so that should work unless the CE reference codec has a bug. I'll have to look at the token termination code.

To your second point, it wouldn't work with +5:30 since there are no half-hour increments in the TZ database. The offset format is the great failing of ISO-8601 since offsets were originally supposed to be an informational piece incidental to the actual time zone. They were sent as offset-only in the early days to support imprecise "time-zoney" values in space-constrained systems (which is why the IANA TZ db only offers minimal support). I was hoping to somehow influence the end of the practice, but I guess it's here to stay :/

I'll modify the spec on the area/location field to support a special form that starts with + or - rather than /, such that 1994-11-05/08:15:30+5:00 == 1994-11-05/08:15:30/Etc/GMT+5 == 1994-11-05/13:15:30/Z. Compact time would store this information in its area/location field as +xx:yy and -xx:yy, which is easy to disambiguate from real IANA timezones since they always start with a letter.

kengruven commented 3 years ago

Sorry for the mess. I didn't mean to turn CE timestamps upside down for you. I just wanted to get something simple working quickly, and all the timestamps I've got here happen to have offsets.

I've got much bigger monkey wrenches to throw into CE later. :-)

kstenerud commented 3 years ago

No, this is a good thing! I've been starved for critiques of the format, and it's impossible for one person to get it 100% correct in a vacuum. Hell, not even 90%. Keep it coming, and thanks!!!