Closed tristan closed 1 year ago
Thanks for the report. I believe your analysis of why the error returned is correct. This may be tricky to handle, as it's two separate phases of parsing. The only thing that immediately comes to mind is if characters are remaining, further options could be tried to see if any succeed without characters remaining.
If I add a modifier try_complete:true
/try_complete:false
, with the default being false, would you view that as a reasonable solution?
The current behavior is false — the first match is returned no matter what. If it's true instead, then the parser would try to find the first match that has no input remaining, falling back to the first overall match otherwise.
It's not clear from your message where the modifier would sit. I assume in one of the format_description!
entries, but it's not clear which one, or where it would sit in the description. It could make sense to change TIMESTAMP_FORMAT_2
to know it needs to consume all input, otherwise fail (kinda like adding $
at the end of a regex), so I expect you mean to make my example problem work by changing that line to
const TIMESTAMP_FORMAT_2: &[FormatItem] =
format_description!("[day]/[month]/[year] [hour padding:none]:[minute]:[second][try_complete:true]");
or
const TIMESTAMP_FORMAT_2: &[FormatItem] =
format_description!("[day]/[month]/[year] [hour padding:none]:[minute]:[second try_complete:true]");
?
In my opinion this feels a bit over complicated, and still wouldn't stop people from making the same mistake without having clear documentation about this limitation of FormatItem::First
and that the try_complete:true
modifier should be used to circumvent it. It would also be tricky to use in cases where you want to use existing format descriptions in a FormatItem::First
list without recreating them. e.g. the well_known
items? though my hypothetical use cases are likely a bit of a stretch, but that also makes me feel like it's a bit much to introduce a new, and likely complex to implement, modifier for such an edge case.
Rather, I think it's important to think about the difference of using FormatItem::First
with <Parsable>::parse
vs Parsed::parse_item
. With Parsed::parse_item
it makes sense that it returns the first viable result, even if another option could consume more input, but for <Parsable>::parse
, having remaining input is an error, so it would make more sense to return the first non-error result in that case. This would mean that making <Parsable>::parse
work as (at least I would assume) intended, it would overload the meaning of FormatItem::First
depending on the context in which you use it. This is also already true for the other FormatItem
variants, though it's more subtle for them. So I'd say it's a question of if this overloading is undesirable enough to resist having special handling in the <Parsable>::parse
implementation to handle FormatItem::First
differently?
I had a go at implementing a fix that uses special behaviour inside <Parsable>::parse
for the FormatItem::First
(and OwnedFormatItem::First
) variants here: https://github.com/tristan/time/commit/00e6d8bdbcd22a507e92efcd15e552639fc27eeb. If I trust the test suite it hasn't broken anything. If you deem it a workable solution I'm happy to make it a PR and address any structure/naming/testing points.
Otherwise it might be enough to simply add some documentation about the limitations of using FormatItem::First
with <Parsable>::parse
so that it's obvious for anyone stumbling across it (like i did) that ordering the items to remove the potential for premature UnexpectedTrailingCharacters
is important.
e.g. the
well_known
items?
That's actually impossible, as they are their own types.
This would mean that making
<Parsable>::parse
work as (at least I would assume) intended
The current behavior is what was intended. Parsing is commonly performed on input that has text both before and after the value, and that is what I intended to facilitate.
This is also already true for the other
FormatItem
variants, though it's more subtle for them.
Only to the extent that the behavior is specified for formatting/parsing. There is no change of behavior depending on which method is used to invoke it.
Otherwise it might be enough to simply add some documentation about the limitations of using
FormatItem::First
with<Parsable>::parse
so that it's obvious for anyone stumbling across it (like i did) that ordering the items to remove the potential for prematureUnexpectedTrailingCharacters
is important.
This can be done regardless of any changes elsewhere.
My original thinking was to have it be a modifier on [first]
. Thinking about it again, I see an alternate path forward that's more flexible.
All that has to be done is introduce a component that indicates it's the end of the input. That, by itself, would permit the various use cases. The primary question I have is what it should be named. It will be quite easy to implement — it needs a new variant on error::ParseFromDescription
and a declaration of the component itself.
I'm nowhere invested or opinionated enough to really push for one direction or the other :sweat_smile:. I wouldn't have had anything to say at all had I not been the one who posted the issue. And I'm certainly not familiar enough with all the possible use cases, so you wont get any flak from me for doing what you think is best. but I'll try give a view from a casual user of the API (i.e. someone who generally just doing datetime parsing of plain strings).
The current behavior is what was intended. Parsing is commonly performed on input that has text both before and after the value, and that is what I intended to facilitate.
I struggle a bit with this in the context of <Parsable>::parse
(or maybe I should be more specific and say <parsable::sealed::Sealed>::parse
, since there is not actually a direct <Parsable>::parse
, or even PrimitiveDateTime::parse
or OffsetDateTime::parse
since those are my usual entry points to the API, but i'll continue to use <Parsable>::parse
for consistency and brevity) . For example, if I take something like:
PrimitiveDateTime::parse(
"2021-01-02 00:00",
&FormatItem::Compound(&fd::parse("[year]-[month]-[day] [hour]:[minute]")?),
);
and try to add anything to either the front or the back of the input string, I'll get a Result::Err
, with no way to receive a potential result contained within. And as far as I can tell (based on some not so thorough grepping), the current implementation of <Parsable>::parse
will always fail if there is extra text on either side of the input. I'm probably missing something, as these DateTime types use <Parsable>::parse_date_time
which might have different expected behaviour with respect to extra input? though in that case i might expect the UnexpectedTrailingCharacters
to come from that function, or the try_into()
for the specific type, rather than the <Parsable>::parse
function (but that also difficult since the information about how many characters have been consumed is not exposed in the <Parsable>::parse
api.
But from a user perspective, things still work as expected in most cases PrimitiveDateTime::parse
(and the other types that have a parse
method) are failing if there is unexpected characters in the input, which fits with the experience of other APIs from other languages. It's only the case with FormatItem::First
, which is already quite a unique API to have and maybe wasn't originally designed to be used in contexts like i've used (it was quite by chance I found it and decided to try use it vs manually looping over my possible cases to find the first one that matched), where it's not quite behaving as I expected. If an extra modifier to specify that a format description must consume the whole input existed when I came across the issue, I probably would have used it and moved on. But that is also assuming I found reference to the modifier before I found that rearranging the options also fixed it. I wouldn't immediately think that there would exist something to mark that a format description should consume the whole input, since with my simplistic use case, and my experience with other datetime parsing libraries in other languages I always expect the whole string to match.
An [end]
component has been introduced, signaling the end of input. It will fail if there is any input remaining. For the example in your original issue, if you append [end]
to each of the format_description!
invocations, it will output what you're looking for.
With the following
FormatItem::First
setup, anUnexpectedTrailingCharacters
is returned due to the 2nd item parsing successfully but having remaining characters, even though the 3rd item would parse successfully.Link to playground
My expectation using this would use the 3rd item and ignore the
UnexpectedTrailingCharacters
error unless the 3rd item also failed to parse.The simple solution for my specific case is to flip the 2nd and 3rd items, since they're unique enough for my use case. But I thought it worth reporting in case it is a bug and not the expected behaviour, or if i'm doing something wrong/unexpected with the api.