odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.12k stars 550 forks source link

Compiler loops infinitely if error line is truncated #3755

Closed Feoramund closed 2 weeks ago

Feoramund commented 2 weeks ago

Regression introduced by #3744. @Mango0x45

package infinite_loop

main :: proc() {
aa := aaaaa.aaaaaaaaaaaaaaa(aaa, aaaaaaaaaaaaaa, aaaaaaaaaaaaaaa)
}

Change the length of the erroneous line, and it compiles through.

offset is being given too many responsibilities in show_error_on_line.

utf8proc_iterate returns valuable error information which isn't being checked. If you have an Odin file with invalid Unicode in it, that also causes an infinite loop.

package infinite_loop_with_invalid_utf

main :: proc() {
    a := <dc00>
}

(That can't be copied directly, you'll have to find another way to enter U+DC00.)


As some side notes about this problem in general, it looks like we're printing ^ regardless of the visual width of the first character of the text. There's still a note in show_error_on_line about how the calculations are for ASCII-only.

/tmp/odin/inf/main.odin(11:6) Error: Undeclared name: a
    a := a
         ^
/tmp/odin/inf/main.odin(12:6) Error: Undeclared name: ば
    b := ば
         ^~^
/tmp/odin/inf/main.odin(13:6) Error: Undeclared name: ç
    c := ç
         ^^
/tmp/odin/inf/main.odin(14:6) Error: Undeclared name: du
    d := du
         ^^
/tmp/odin/inf/main.odin(15:6) Error: Undeclared name: växjö
    v := växjö
         ^~~~~^

I would've expected CJK to be taken into account based on https://github.com/odin-lang/Odin/issues/3743#issuecomment-2164060865, and I was surprised to see ç treated as two glyphs. Even the växjö example given in the original issue is mis-aligned with an extra tilde.

I feel like rewriting this entire procedure is in order if the goal is to have proper padding, per the PR. For one, the usage of variable names off and offset was confusing initially, as well as rune_width which should actually be bytes_read - especially when we're checking the actual character width right adjacent.

All that aside, it was the height of cosmic comedy that I met this bug while working on a project to detect the visual width of a Unicode string.

Kelimion commented 2 weeks ago

I concur. I've reverted #3744.