toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.45k stars 847 forks source link

Is -00:00 a valid offset for offset date-time? #916

Closed LongTengDao closed 2 years ago

LongTengDao commented 2 years ago

4.3. Unknown Local Offset Convention

If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time. RFC2822 [IMAIL-UPDATE] describes a similar convention for email.

According to the RFC 3339, -00:00 means no information about offset (not equal to Z or +00:00, which means the offset is zero), and it seems more like local date-time do in TOML.

Should 1970-01-01 00:00:00-00:00 be a valid offset date-time while parsing?

arp242 commented 2 years ago

Offset date-time is RFC 3339; why shouldn't it be valid in TOML? That there's also another way to express this doesn't really matter IMO, and just seems confusing: "Offset date-time is any RFC 3339 datetime, except when [...], in which case it's not accepted by TOML".

More pragmatically, people are probably already using this format, so we can't really change it without breaking compatibility.

LongTengDao commented 2 years ago

What do you mean by "IMO"?

arp242 commented 2 years ago

In my opinion

LongTengDao commented 2 years ago

IMO, this should be valid, same with you.

The reason is "the time in UTC is known", which matches the aim of Offset Date-Time ("to unambiguously represent a specific instant"), though there is no "offset" which seems conflict with the name.

I need to confirm that cautiously for implementation.

eksortso commented 2 years ago

The offset -00:00 is a valid offset and so an offset date-time value that uses it must be valid. Plain and simple.

Would it still be considered an offset date-time, though? Or should we consider such a value to be a local date-time, and make mention of it in that type's section of toml.md? I would say that the former is correct. It's got an offset, so it's still an offset date-time.

When parsed, however, the resulting date-time's value may be considered local, and re-encoding the value may cause the offset to vanish switch to +00:00. That is, unless the special offset is preserved by the parser, in which case the encoder can keep the offset in place when writing it out. Or unless we require that the parsed value must retain the special offset after parsing.

Which approach is correct?

Either way, we ought to mention in toml.md what could or should happen to offset date-times with offset -00:00 when parsed.

Then we need to make sure to write tests to enforce the proper behavior.

EDIT: Corrected for a nisinterpretation.

marzer commented 2 years ago

Would it still be considered an offset date-time, though?

The presence of an offset should unequivocally mark it as an offset date-time, regardless of what the offset itself is. I can't think of a good reason for an exception here. The exception the RFC makes here seems very silly.

~I suppose they considered that serializers might want a way to always emit an 'offset' (e.g. for formatting purposes) but also have that offset be ineffectual, maybe? Smells like bikeshedding at work.~

Just saw the paragraph from the RFC and it makes a bit more sense now, heh. I still think it's bikeshedding. How common is the situation that a time is known to be UTC but the offset is unknown?

eksortso commented 2 years ago

Just saw the paragraph from the RFC and it makes a bit more sense now, heh. I still think it's bikeshedding. How common is the situation that a time is known to be UTC but the offset is unknown?

Definitely bikeshedding, yeah, and frankly unnecessary for configuration purposes. Using an offset of -00:00, though valid in RFC 3339, is confusing in practice. If a user is unsure of the local offset, then they should tell their complicated narrative somewhere else and just leave it out of the timestamp.

However, RFC 3339 explicitly says that the UTC timestamp is known when -00:00 is applied. Those time digits are UTC. So I would use Z or +00:00 in full confidence, and I would treat -00:00 the exact same way.

Note: I misinterpreted the convention at first, thinking the local time (not UTC) was known when the convention is applied. I fixed my comment, and not much changed.

LongTengDao commented 2 years ago

I can only imagine one case for -00:00: the writer tell the UTC time, but do not want others think (s)he is living in the timezone 00:00.

I think the minimal problem is the data type name in the TOML spec: Offset Date-Time means Absolute Date-Time actually.

arp242 commented 2 years ago

I can only imagine one case for -00:00: the writer tell the UTC time, but do not want others think (s)he is living in the timezone 00:00.

Compatibility with other systems would be the main use case I can think of.

LongTengDao commented 2 years ago

I can only imagine one case for -00:00: the writer tell the UTC time, but do not want others think (s)he is living in the timezone 00:00.

Compatibility with other systems would be the main use case I can think of.

I can't imagine... Would you give a "story"? Thank you!

arp242 commented 2 years ago

I can only imagine one case for -00:00: the writer tell the UTC time, but do not want others think (s)he is living in the timezone 00:00.

Compatibility with other systems would be the main use case I can think of.

I can't imagine... Would you give a "story"? Thank you!

I don't have any concrete use cases, but RFC3999 is widely used and this notation is probably used in some systems and can thus end up in TOML files.

Disallowing it would very likely break things for some people, even if it's a small group. I don't really see any practical problem in having this notation, and don't really see any problem that gets solved by forbidding it.

eksortso commented 2 years ago

Disallowing it would very likely break things for some people, even if it's a small group. I don't really see any practical problem in having this notation, and don't really see any problem that gets solved by forbidding it.

I don't think disallowing -00:00 was ever on the table. It's already valid, no question about that. The question that needs an answer is, how parsers ought to interpret that offset. The use cases that we've seen basically boil down to just converting the timestamp to UTC.

By doing this, we never encounter a non-zero offset, and interoperability is preserved as all timestamps are expressed in UTC. I know only one argument for converting to a local datetime, this recent answer on Stack Overflow to a Java question about the -00:00 convention. Java doesn't support it. But if the time in UTC is already available, it's better to just treat -00:00 like it's +00:00 and to keep the value as an offset date-time type.

We have yet to see a use case where the meaning of the special convention must be preserved in the value. So let's not require that.

ChristianSi commented 2 years ago

Interestingly, the posters in the Stack Overflow thread misunderstood the meaning of the negative timestamp, interpreting such datetimes as local which they really aren't.

eksortso commented 2 years ago

@ChristianSi Yeah, but I think that, at least for that one answer, they switched to LocalDateTime for a semantically valid reason: they didn't want to bother with offsets at all. They didn't notice that the time was UTC, just with a bizarre offset propped up by RFC 3339 alone. I made the same mistake starting out. (I just now added a comment to that answer.)

@LongTengDao In practice, such timestamps in TOML should always be offset datetimes. So I'd say you can treat -00:00 as if it is Z in your parser, if you never need the semantic distinction that -00:00 is supposed to make. Practically everyone already does that.

You could make a parser that was aware of that difference, but if you did spend the time to do that, then it would satisfy an extremely niche case.