time-rs / time

The most used Rust library for date and time handling.
https://time-rs.github.io
Apache License 2.0
1.13k stars 281 forks source link

Serde default iso8601 config uses six digits year which seems unexpected #548

Closed czocher closed 1 year ago

czocher commented 1 year ago

Hello, When trying to serialize an Option<OffsetDateTime> using the default iso8601 serializer provided by time-rs it serializes the year with six digits. This is due to the config provided here.

This seems rather unexpected. Is there some way to change it without rewriting the serializer?

jhpratt commented 1 year ago

The reason this is the default is because the serialization would otherwise fail for large dates.

With regard to using a different configuration, the next release will be helpful to you. You'll be able to use something like the following:

use time::format_description::well_known::{Iso8601, iso8601};
const FORMAT: _ = Iso8601<{ iso8601::Config::DEFAULT.set_year_is_six_digits(false).encode() }>;
time::serde::format_description!(my_format, OffsetDateTime, FORMAT);

That generates a my_format module that you can use in conjunction with #[serde(with)].

I haven't checked this exact example, but this general approach is what you'll likely want.

czocher commented 1 year ago

Thanks @jhpratt! I see value in documenting this behaviour. I can provide a PR, but can you give an example of a use-case which would fail if six digits year would be disabled? Does this happen only for dates in the far future or before 9999 BC?

If that's the case maybe the default should be changed to false and the option should be documented for edge-cases when it is reasonable to use six digit years? Otherwise another suggestion from my side is to ignore the leading year zeros when serializing with the default iso8601 serializer. What do you think about this idea?

jhpratt commented 1 year ago

can you give an example of a use-case which would fail if six digits year would be disabled? Does this happen only for dates in the far future or before 9999 BC?

Both. If you use Iso8601::DEFAULT, the year must be in the range 0..=9999.

If that's the case maybe the default should be changed to false and the option should be documented for edge-cases when it is reasonable to use six digit years?

Doing this would break any situation where people are already using it.

Otherwise another suggestion from my side is to ignore the leading year zeros when serializing with the default iso8601 serializer. What do you think about this idea?

After reviewing the specification, doing this would not be compliant. The number of additional digits (2) in the expanded representation is what is required to represent all values in time. It is not permissible to have a varying number of additional digits.


More generally, I'm not certain why this is an issue for you. The only things guaranteed by well-known formats are:

Anything beyond that is a faulty assumption.

czocher commented 1 year ago

@jhpratt not necessarily a problem for me - as long as there's a solution, which you already provided, and a clear reason for this behavior, which you also gave, I feel satisfied (and actually rather impressed) with how this works.

My questions were asked due to how surprising this behaviour was when I first noticed it, to understand it better and to possibly make it less surprising for future users.

Would you be willing to accept a PR to add some documentation/examples, based on our discussion in this issue, to the corresponding structs?

jhpratt commented 1 year ago

Documentation improvements are always welcome. I'm working on documenting something unrelated before I have a release, but it'll be in the one after that for sure. Including the exhaustive list of guarantees in the well_known module is probably a good idea at the least.

czocher commented 1 year ago

@jhpratt I'll close the issue for now and see if I can provide the PR after the new release arrives ;).

kyegupov commented 1 year ago

@jhpratt I also have run into this problem.

You say "I'm not certain why this is an issue for you." - I can explain.

The ISO 8601 standard says:\ [removed due to copyright]

The important bit is "mutual agreement". The six-digit representation, as you generate it, is NOT automatically a "compliant" ISO 8601 date, as you claim.

I can easily imagine there are many ISO 8601 parsers out there that would fail to parse six-digits-year dates that time-rs generates. By focusing on a very improbable risk (problem representing dates beyond year 9999), you are actually neglecting a far more probable risk (broken interoperability with other systems).

At the very least, this behavior of serde::iso8601 module should be documented in big bold letters, to warn the users that the uncommon 6-digit representation would be used.

But I agree with the initial argument by @Czocher that emitting 4-digit years should be the default behavior. When you say "Doing this would break any situation where people are already using it." - I don't understand how. If you can parse 6-digit years but emit 4-digit years by default, what could your parser possibly break?

jhpratt commented 1 year ago

@kyegupov I'm not certain what copy of ISO 8601 you have, but it is not ISO 8601-1:2019 (which is what is implemented), as that is not the text present (there's not even a section 3.5). Regardless, do not copy-paste parts of the specification; it is copyrighted material. I have edited out the text for that reason. It is possible to convey information without copying copyrighted material.

Mutual agreement is also required for any date before 1582-10-15, as that's when the Gregorian calendar itself was introduced. Surely you don't want that to be an error? Mutual agreement is also required for the number of decimal digits after a second, with the caveat that it can't be zero (if the . is present, there has to be a digit after it). time emits as many as it can: nine. Would you prefer I only emit the first digit so as to remain compliant to a tee, or is a slight bit of leeway (agreement being implied) acceptable? I could also error out upon parsing a second decimal digit, but it's best to accept an indefinite number and discard the ones that are insignificant.

Sometimes specifications are not entirely clear, and this is one such case. What constitutes "agreement" with regard to software? It is impossible to know what is on the other end, so I must rely on what I can do. That is a lossless serialization for all values that can be present in time. Taking one look at the output, as any programmer needing such strict requirements would do, would clearly show that the year is six digits. As I stated above, I am open to a pull request that states the default values for the time::serde::iso8601 module.

If you can parse 6-digit years but emit 4-digit years by default, what could your parser possibly break?

Not my parser, but someone else's. If they are, for whatever reason, relying on this behavior, they could have a successful deserialization turn into an error.

czocher commented 1 year ago

@kyegupov from what I noticed @jhpratt already did an amazing job documenting some of the things we noticed, see: https://github.com/time-rs/book/commit/bcc50debc79eba11cd4dd6dedd96acc9b0070bc5.

What I believe the consensus here is that the documentation is lacking. @jhpratt if you believe the book can be improved in one way or another please create issues in the time-rs book repository. I'd like to contribute but frankly I'm not really sure how can I help or even when I select a topic, like the documentation or the book, I'm not sure what to write about :D.

jhpratt commented 1 year ago

Yeah, I documented some last night. For this specifically, having documentation in this repository is a good idea as well. I can add a note about the implied agreement as well.

As to the time-rs/book repository, the existing pages are decent, but the rest of the pages are blank (and unrelated to this issue).

ehiggs commented 1 year ago

@kyegupov I'm not certain what copy of ISO 8601 you have, but it is not ISO 8601-1:2019 (which is what is implemented), as that is not the text present (there's not even a section 3.5). Regardless, do not copy-paste parts of the specification; it is copyrighted material. I have edited out the text for that reason. It is possible to convey information without copying copyrighted material.

It's from ISO 8601:2004. Wikipedia cites ISO 8601:2004 for anyone reading along.

Section 3.5 was moved to section 4.4 in the 2019 standard.