time-rs / time

The most used Rust library for date and time handling.
https://time-rs.github.io
Apache License 2.0
1.06k stars 261 forks source link

`well_known::Iso8601` does not parse certain legal ISO 8601 values #595

Closed coriolinus closed 1 year ago

coriolinus commented 1 year ago

All of these values are legal ISO 8601:

However, of these formats, well_known::Iso8601 can only parse three:

We can improve on this situation by defining our own custom format:

let postgres_format = format_description!(
    version=2, 
    "[year]-[month]-[day][first [ ][T]][hour]:[minute]:[second].[subsecond][first [Z][[offset_hour sign:mandatory][first [:[offset_minute]][[offset_minute]][]]]]"
);

This can parse a broader set of formats, but misses out on the Z annotation for UTC; those items fail with Err(TryFromParsed(InsufficientInformation)).

While it makes sense that we can't parse an OffsetDateTime from those formats missing a TZ offset entirely, well_known::Iso8601 should be able to parse the rest of them.

playground

coriolinus commented 1 year ago

One possible workaround: if the language accepted by format_description! accepted a non-consuming token like [assign [offset_hour = 0][offset_minute = 0]], then we could handle the Z case in the custom format.

It's still preferable if the well-known formatter handles everything properly, but having the ability to write our own format which accepts a superset of what the well-known formatter does would be an improvement over the status quo.

Also, it would open the door to toy formatters like

let sundial = format_description!("[first [I[assign [hour=1]][II[assign [hour=2]][III[assign [hour=3]][IV[assign [hour=4]][V[assign [hour=5]][VI[assign [hour=6]][VII[assign [hour=7]][VIII[assign [hour=8]][IX[assign [hour=9]][X[assign [hour=10]][XI[assign [hour=11]][XII[assign [hour=12]]];
jhpratt commented 1 year ago

Of the values that you provided, all of them with a space instead of "T" are invalid. ISO 8601 requires a T without exception. Looking at the remaining ones:

non-consuming token like [assign [offset_hour = 0][offset_minute = 0]]

This would fundamentally alter the division between parsing the format description and parsing the value.

coriolinus commented 1 year ago

ISO 8601 requires a T without exception.

The ISO website seems to disagree:

For example, September 27, 2022 at 6 p.m. is represented as 2022-09-27 18:00:00.000

This would fundamentally alter the division between parsing the format description and parsing the value

Sure, that was just one possibility for a workaround, not a preferred solution.

jhpratt commented 1 year ago

The ISO website seems to disagree:

Ironically, the website is objectively incorrect. I'll quote the smallest possible part of the specification.

The date is followed (without space) by [“T”] followed (without space) by the time, optionally including the time shift designator.

After looking quite a bit into the document, I had more than sufficient reason to believe that my implementation is correct, as the format definitions match what I implemented. However, the examples appear to indicate that the offset minute may be omitted in any situation. I'll make a modification to this behavior.

coriolinus commented 1 year ago

Thank you for improving the omitted offset minutes situation!

Even if you are not willing to officially support parsing with a space instead of a T, would it be possible to produce a more specific error variant than UnexpectedTrailingCharacters? From my perspective, handling Err(Parse::SpaceInsteadOfT(timestamp)) is almost as good as handling Ok(timestamp).

The specific use case I'm interested in is handling output from Postgres, which is defined as "ISO 8601, but with a space instead of a T". docs

jhpratt commented 1 year ago

The parser works in two phases. First, it parses the value into the Parsed struct (the same one that is public). It then converts the struct to the desired type. Both phases are fallible.

This specific error occurs in the second phase. This is because it successfully parses the date into Parsed, but stops when it encounters the space (as it's not valid in that position). Given that only a date is valid ISO 8601, there is no reason to reject this in the first phase. For this reason, such an error could not be implemented with the current structure and process.

Finally, I'll note that the docs for postgres also aren't accurate.

he offset will be shown as hh (hours only) if it is an integral number of hours, else as hh:mm if it is an integral number of minutes, else as hh:mm:ss.

ISO 8601 contains no mention of offset seconds, only hours and minutes.


As I've pushed a commit containing the changes mentioned in my previous comment, I'm closing this as completed.