Closed Mango0x45 closed 3 weeks ago
Looking at the compiler, it seems it uses utf8proc? That means that the correct amount of left-padding should be easy to determine with utf8proc_charwidth()
Unicode visual width is a tough, potentially unsolvable problem. See the discussion in #3432, particularly these comments https://github.com/odin-lang/Odin/issues/3432#issuecomment-2057519705 and https://github.com/odin-lang/Odin/issues/3432#issuecomment-2057724712.
It's unfortunately not something that can be solved simply by counting codepoints/runes.
It's unfortunately not something that can be solved simply by counting codepoints/runes.
Yes, that's why I mentioned the utf8proc_charwidth()
function, which does more than simply count codepoints. While it's impossible to know how wide a codepoint is for sure without having font information, you can reliably get it right about 99% of the time in practice by simply detecting if a codepoint is a combining mark, control character, full width CJK, emoji, etc. It's pretty easy to generate a lookup table to do this, and luckily utf8proc already does it for us.
This is the same approach used by editors like Vim to tell you what screen column you're on, and compilers like GCC to tell you what column an error is at (and to format the error message properly).
Closed via #3744.
Context
Expected Behavior
When diagnostics are emit, I expect the
^~~~^
range to match the token the compiler is complaining about.Current Behavior
Currently, the range ASCII-art will overshoot the token if it’s preceded on the line by any non-ASCII codepoint.
Steps to Reproduce
Run
odin build foo.odin -file
on the following:Failure Logs