Open skius opened 1 year ago
@sffc thinks about holding a reference to the source string in the ParseError itself: https://github.com/unicode-org/icu4x/pull/3547#discussion_r1234726864
Discuss with:
Optional:
MainToken
-based main-parse-loop. This means in cases like [a-{hello\ world}]
we have all the required data available to say "error: unexpected string, expected single code point", and improving these cases would be relatively simple.It also introduces an edge case with an objectively bad error message:
[a-\x{62 64}]
[a-\← error: unexpected character '\\'
This is bad because [a-\x{62}]
is valid, in other words \
is not actually unexpected. The important thing causing the error is that it's a multi-codepoint-escape as part of a range.
How expressive should UnicodeSet parse errors be? Does it suffice if we show which character at which position was the issue, or should we give precise information also about what we expected? (i.e., "\xag<-- error: was parsing an \x-escape, expected precisely two hex-characters, got 'g'")
EDIT: Examples of current parse errors: https://github.com/unicode-org/icu4x/pull/3547/files#diff-1ab141f559ba2ebd644683b4cf5255a30d1e0a7f949b9cf950522f1a8b0cbcc5R1146