skius commented 1 year ago

How expressive should UnicodeSet parse errors be? Does it suffice if we show which character at which position was the issue, or should we give precise information also about what we expected? (i.e., "\xag<-- error: was parsing an \x-escape, expected precisely two hex-characters, got 'g'")

EDIT: Examples of current parse errors: https://github.com/unicode-org/icu4x/pull/3547/files#diff-1ab141f559ba2ebd644683b4cf5255a30d1e0a7f949b9cf950522f1a8b0cbcc5R1146

skius commented 1 year ago

@sffc thinks about holding a reference to the source string in the ParseError itself: https://github.com/unicode-org/icu4x/pull/3547#discussion_r1234726864

sffc commented 1 year ago

Discuss with:

@robertbastian
@younies

Optional:

@sffc
@skius

skius commented 1 year ago

3670 introduces a `MainToken`-based main-parse-loop. This means in cases like `[a-{hello\ world}]` we have all the required data available to say "error: unexpected string, expected single code point", and improving these cases would be relatively simple.

It also introduces an edge case with an objectively bad error message:

Input [a-\x{62 64}]
Output [a-\← error: unexpected character '\\'

This is bad because [a-\x{62}] is valid, in other words \ is not actually unexpected. The important thing causing the error is that it's a multi-codepoint-escape as part of a range.

unicode-org / icu4x

Decide expressiveness of UnicodeSet parsing errors #3558

3670 introduces a `MainToken`-based main-parse-loop. This means in cases like `[a-{hello\ world}]` we have all the required data available to say "error: unexpected string, expected single code point", and improving these cases would be relatively simple.

unicode-org / icu4x

Decide expressiveness of UnicodeSet parsing errors #3558

3670 introduces a MainToken-based main-parse-loop. This means in cases like [a-{hello\ world}] we have all the required data available to say "error: unexpected string, expected single code point", and improving these cases would be relatively simple.

3670 introduces a `MainToken`-based main-parse-loop. This means in cases like `[a-{hello\ world}]` we have all the required data available to say "error: unexpected string, expected single code point", and improving these cases would be relatively simple.