winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
525 stars 40 forks source link

chore: Correct ^ error position in unicode strings #553

Closed gzigzigzeo closed 3 months ago

gzigzigzeo commented 3 months ago

Problem

Given we have the following string:

let input = r#""Hello, δΈ–η•Œ! 🌍\*""

Let's say that \* represents an invalid sequence. By default ^ error symbol would be misplaced in the standard output:

  |
7 |     "Hello, δΈ–η•Œ! 🌍\*"
  |                ^

This happens because symbols like δΈ–η•ŒπŸŒ have double width in fixed width fonts.

The more emojis precede an error, the more the error pointer shifts. It is confusing, especially if errors are displayed to the end user.

Solution

Adds unicode-width feature, which calculates correct offset using unicode-width crate.

  |
7 |     "Hello, δΈ–η•Œ! 🌍\*"
  |                    ^

Thank you for this great crate!

epage commented 3 months ago

We intentionally keep the error simple as these types of questions are dependent on where you are rendering to and how they handle different unicode characters, as well as keeping the dep tree small.

If you'd like to discuss this further, feel free to create an issue. In general, I recommend issues for non-trivial changes for discussing the solution space and encourage PRs only for reviewing an implementation.

gzigzigzeo commented 3 months ago

Thanks, will open an issue then.