odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.12k stars 550 forks source link

Refactor `show_error_on_line` #3760

Closed Feoramund closed 3 days ago

Feoramund commented 2 weeks ago

UPDATE: Alright, I went the long way around, but this new fix should satisfy everyone.

I've re-implemented the squiggles, except this time I'm using a library I wrote based off all the Unicode grapheme cluster parsing code I wrote for the Odin core library collection, ported to C, then stripped down to fit into the Odin compiler with some adjustments.

In short, the squiggle length and position are now calculated off of real visual width based on the Unicode standards I've cited before. It's the same code powering the Unicode support in core:text/table now, just in C, inside the Odin compiler.

To be clear, not much has changed visually. I made the squiggles trail off ^~~~ ... if the error line trails off with ellipses as well.

(Everything below this line applies only to the old solution. I've left it intact, in the event the ideas are useful to someone in the future.)


The Joy of... Appreciating Errors?

Well, maybe.

odin_error_underline


I didn't want to leave the notes on #3755 or the core issue of #3743 left alone, so I thought for a bit on how to best solve this. As has already been stated, calculating Unicode visual width is difficult.

So why not approach the issue from a different angle? I can't say it's a new one, but it's underused, for sure.

The terminal already gives us everything we need to underline ranges of any size or character encoding. It could be Latin-1 or Shift-JIS for all we know. Simply insert the right ANSI SGR codes at the start and end. Let the terminal figure it out; it has all the font, glyph, and styling information already.

I know Rust, Clang/LLVM, et cetera use padded out ^~~^, and there's some precedent for that style, but I think this is the simpler solution.

There is only one caveat to this solution. It's minor but worth noting. Because we (1) don't calculate visual width, but we (2) do perform truncation, there's a chance for the displayed line to be longer or shorter than the MAX_LINE_LENGTH value.

Everything will still be aligned though, because there is nothing to align.

I'm open to styling suggestions. Try this out in your terminal with your own font and colour configuration. It's likely to look different from the picture. One old terminal emulator I tried misinterpreted the wiggly underline as a slow blink, so that was interesting. The rest supported it with no problem. I did spend a while trying out different colours, colour combinations, marker characters (1 or 2, ASCII or Unicode dingbats), et cetera. I found this to be the most appealing without being burdensome, but I also have been testing it for some hours, so I'm a bit saturated.

This is the test suite I used.

Kelimion commented 2 weeks ago

I thought I'd try it out and hit a bug right away. :-)

package main

main :: proc() {
    foo := 4, 2
}
W:\Odin\src\error.cpp(475): Assertion Failure: `line_length_bytes >= 0` Bounds-checking error: line_length_bytes

As for the example that started it, it renders the chevrons as >< before the identifier in question.

W:/Odin/bug/bug.odin(5:9) Error: 'växjö' declared but not used 
    ><växjö := City { "foo" } 
W:/Odin/bug/bug.odin(5:25) Error: Cannot convert '"foo"' to 'int' from 'untyped string', got "foo" 
    växjö := City { >>"foo"<< } 

These bugs aside, I much prefer the old ^~~~^ visual style. It's much more subdued than the gaudy yellow chevrons. Even without the colors, >>here<< just comes off a bit shouty, tbh. I've used that style before when I'm tracing a particularly hard to track down problem, but it's a bit much for regular syntax errors.

Now unfortunately that means that without the chevrons we can't rely on the SGR wiggle code and must align and print them as before.

But I'd be remiss not to commend you for the test suite you put together. It's going to come in very handy in putting this issue to bed.

gingerBill commented 2 weeks ago

I personally do not like the aesthetics of this. I don't use a coloured terminal (SublimeANSI plugin is very bad) so it would only be >>foo<< for me. It also hides the actual code by making it inline.

I prefer reading ^~~^ even if it still has a few problems with regards to UTF-8 Unicode rendering.

If there is another alternative, I am all ears.

Feoramund commented 2 weeks ago

I thought I'd try it out and hit a bug right away. :-)

That's unfortunate. I was unable to reproduce this with your exact example. I tried odin check bug.odin -file -vet -strict-style.

/tmp/odin/bug.odin(4:13) Syntax Error: Too many values on the right hand side of the declaration
    foo := 4, 2><

But then I added a single space to the end of 4, 2, and it did indeed fail the assertion. Interesting. I'll have to examine that later in more depth. Thank you for the additional test case. :)

As for the example that started it, it renders the chevrons as >< before the identifier in question.

This is a bug with the checker, not with show_error_on_line.

(lldb) f
frame #0: 0x00005555555b1c5a odin`show_error_on_line(pos=0x00007fffc4608068, end=(file_id = 0, offset = 0, line = 0, column = 0)) at error.cpp:278:46
(lldb) v end
(TokenPos) end = (file_id = 0, offset = 0, line = 0, column = 0)

No end token is given for that particular error, so naturally it cannot know where to underline, or put squiggles as the case may be. You can see this with current master:

/tmp/odin/vaxjo.odin(6:2) Error: 'växjö' declared but not used
    växjö := City { "foo" }
    ^

[...]

Now unfortunately that means that without the chevrons we can't rely on the SGR wiggle code and must align and print them as before.

The wiggle code's non-standard, but there is the plain old underline which is far better supported, if it ever were the case you needed wide support for that manner of styling.


[...]

If there is another alternative, I am all ears.

If that's the case, it looks like I'll have to write a Unicode glyph counting procedure to truly solve this.

I was experimenting with writing one in Odin based on Unicode Technical Report 29 at the time I ran into the infinite loop bug, but I do not know how well it will work in practice. Most solutions I've heard of to this problem use lookup tables, not the manner of grapheme cluster boundary text segmentation they suggest in the report. In theory, it should work far better than lookup tables, but I haven't finished it yet to know what its quirks might be.

When I can get back to this, I'll see how well my Odin implementation works, and if it pans out, translate it to C.

Kelimion commented 2 weeks ago

The wiggle code's non-standard, but there is the plain old underline which is far better supported, if it ever were the case you needed wide support for that manner of styling.

It's not that it's non-standard in itself, but like underline or bold ANSI codes, it'll get lost when redirected to a file or an editor's build result tab. As the chevrons appear like they're not going to stay around, a purely special effects emphasis would be lost to many in a way that the previous ^~~~^ would not be.

image

Feoramund commented 2 weeks ago

[...] it'll get lost when redirected to a file or an editor's build result tab.

That clears up my confusion. I don't use a graphical editor and work exclusively in the terminal. I did consider how editors or other tooling might handle this, and my thinking was it would be fine, because they should be using either -terse-errors (which wouldn't print the highlight) or the JSON export (where they could do their own highlighting based on the position information given).

It's been quite some time since I last used an IDE, but now I'm remembering the build result tab that many of them have for this sort of thing. With that in mind, I'm starting to see how this could be a pain more than a relief, and if I'm not mistaken, I believe most of Odin's userbase is on Windows using this sort of tooling. There is likely no good way around not having a Unicode glyph counting mechanism, with all of these facets known to me now.

Feoramund commented 4 days ago

@Kelimion I'm back with this one. I edited the PR description, but to summarize, I got rid of the chevrons and the non-standard ANSI underline. We're back to squiggles, except I ported all my Unicode parsing code to C and dropped that into place. It's all calculated by visual width now, as best as it can be.

Much of the code has been simplified since it doesn't need to do inline truncation anymore, and I was unable to reproduce the error you found after I rewrote everything again.

One caveat is that tabs are treated as 1 space long, but that should only matter if someone has a tab after non-whitespace characters in their code, since error lines have whitespace stripped before they're shown. I can hack a fix in for that if you think it's needed. It's just a matter of deciding whether to use 4 or 8. Either way, if we use one and someone has the other configured, it'll look weird.

Kelimion commented 4 days ago

@Kelimion I'm back with this one. I edited the PR description, but to summarize, I got rid of the chevrons and the non-standard ANSI underline. We're back to squiggles, except I ported all my Unicode parsing code to C and dropped that into place. It's all calculated by visual width now, as best as it can be.

I'm hosting a dinner party tomorrow and it's 1:30am at the moment, so I'm unlikely to offer feedback until late tomorrow or sometime Monday, but I'm glad you didn't give up on the problem.

Feoramund commented 4 days ago

No rush at all, just wanted to give a detailed heads-up in case GitHub didn't send out any notifications. :)

I wasn't going to give up on this either, not after all that Unicode mountain climbing.