Closed coriolinus closed 1 year ago
One possible workaround: if the language accepted by format_description!
accepted a non-consuming token like [assign [offset_hour = 0][offset_minute = 0]]
, then we could handle the Z
case in the custom format.
It's still preferable if the well-known formatter handles everything properly, but having the ability to write our own format which accepts a superset of what the well-known formatter does would be an improvement over the status quo.
Also, it would open the door to toy formatters like
let sundial = format_description!("[first [I[assign [hour=1]][II[assign [hour=2]][III[assign [hour=3]][IV[assign [hour=4]][V[assign [hour=5]][VI[assign [hour=6]][VII[assign [hour=7]][VIII[assign [hour=8]][IX[assign [hour=9]][X[assign [hour=10]][XI[assign [hour=11]][XII[assign [hour=12]]];
Of the values that you provided, all of them with a space instead of "T" are invalid. ISO 8601 requires a T without exception. Looking at the remaining ones:
2023-06-12T14:24:18.684
— does not include UTC offset, so parsing as OffsetDateTime
correctly fails. Successfully parse as PrimitiveDateTime
.2023-06-12T14:24:18.684Z
— successful parse2023-06-12T14:24:18.684+00
— failed to parse. Upon a quick glance at the specification, this appears to be correct. I will re-read the relevant parts to see if there's something I'm missing.2023-06-12T14:24:18.684+0000
— successful parse2023-06-12T14:24:18.684+00:00
— successful parsenon-consuming token like
[assign [offset_hour = 0][offset_minute = 0]]
This would fundamentally alter the division between parsing the format description and parsing the value.
ISO 8601 requires a T without exception.
The ISO website seems to disagree:
For example, September 27, 2022 at 6 p.m. is represented as 2022-09-27 18:00:00.000
This would fundamentally alter the division between parsing the format description and parsing the value
Sure, that was just one possibility for a workaround, not a preferred solution.
The ISO website seems to disagree:
Ironically, the website is objectively incorrect. I'll quote the smallest possible part of the specification.
The date is followed (without space) by [“T”] followed (without space) by the time, optionally including the time shift designator.
After looking quite a bit into the document, I had more than sufficient reason to believe that my implementation is correct, as the format definitions match what I implemented. However, the examples appear to indicate that the offset minute may be omitted in any situation. I'll make a modification to this behavior.
Thank you for improving the omitted offset minutes situation!
Even if you are not willing to officially support parsing with a space instead of a T
, would it be possible to produce a more specific error variant than UnexpectedTrailingCharacters
? From my perspective, handling Err(Parse::SpaceInsteadOfT(timestamp))
is almost as good as handling Ok(timestamp)
.
The specific use case I'm interested in is handling output from Postgres, which is defined as "ISO 8601, but with a space instead of a T". docs
The parser works in two phases. First, it parses the value into the Parsed
struct (the same one that is public). It then converts the struct to the desired type. Both phases are fallible.
This specific error occurs in the second phase. This is because it successfully parses the date into Parsed
, but stops when it encounters the space (as it's not valid in that position). Given that only a date is valid ISO 8601, there is no reason to reject this in the first phase. For this reason, such an error could not be implemented with the current structure and process.
Finally, I'll note that the docs for postgres also aren't accurate.
he offset will be shown as
hh
(hours only) if it is an integral number of hours, else ashh:mm
if it is an integral number of minutes, else ashh:mm:ss
.
ISO 8601 contains no mention of offset seconds, only hours and minutes.
As I've pushed a commit containing the changes mentioned in my previous comment, I'm closing this as completed.
All of these values are legal ISO 8601:
2023-06-12 14:24:18.684
2023-06-12 14:24:18.684Z
2023-06-12 14:24:18.684+00
2023-06-12 14:24:18.684+0000
2023-06-12 14:24:18.684+00:00
2023-06-12T14:24:18.684
2023-06-12T14:24:18.684Z
2023-06-12T14:24:18.684+00
2023-06-12T14:24:18.684+0000
2023-06-12T14:24:18.684+00:00
However, of these formats,
well_known::Iso8601
can only parse three:2023-06-12T14:24:18.684Z
2023-06-12T14:24:18.684+0000
2023-06-12T14:24:18.684+00:00
We can improve on this situation by defining our own custom format:
This can parse a broader set of formats, but misses out on the
Z
annotation for UTC; those items fail withErr(TryFromParsed(InsufficientInformation))
.While it makes sense that we can't parse an
OffsetDateTime
from those formats missing a TZ offset entirely,well_known::Iso8601
should be able to parse the rest of them.playground