zyedidia / micro

A modern and intuitive terminal-based text editor
https://micro-editor.github.io
MIT License
24.96k stars 1.17k forks source link

highlighter: Change the parsing approach to significantly improve performance #3127

Open JoeKar opened 8 months ago

JoeKar commented 8 months ago

The performance of the current parsing approach can't be improved without changing the whole highlighter code. Due to this the change isn't without any risk, but it's definitely worth the try. Please see the following list, which has been done with the same host and micro -profile. The test has been stopped after complete highlight within 80x24 or aborted due to known "endless" recursion (DNF). Afterwards the top1 has been printed with pprof.

file references before after
1. 1.93s 12.26% 12.26% 1.93s 12.26% unicode/utf8.DecodeRune 10ms 100% 100% 10ms 100% runtime.futex
2. (DNF) 5.41s 14.64% 14.64% 5.41s 14.64% unicode/utf8.DecodeRune 10ms 100% 100% 10ms 100% runtime.writeHeapBits.flush
3. 10ms 20.00% 20.00% 10ms 20.00% github.com/zyedidia/tcell/v2.(*tScreen).SetContent 20ms 40.00% 40.00% 20ms 40.00% github.com/zyedidia/micro/v2/internal/util.CharacterCount
4. 10ms 20.00% 20.00% 10ms 20.00% crypto/md5.block 10ms 25.00% 25.00% 10ms 25.00% gopkg.in/yaml%2ev2.yaml_parser_update_buffer
5. 10ms 50.00% 50.00% 10ms 50.00% gopkg.in/yaml%2ev2.yaml_parser_scan_plain_scalar 10ms 33.33% 33.33% 10ms 33.33% runtime.(*consistentHeapStats).acquire
6. (DNF) 8.79s 27.01% 27.01% 14.53s 44.65% regexp.(*Regexp).tryBacktrack 10ms 20.00% 20.00% 10ms 20.00% github.com/zyedidia/micro/v2/internal/util.DecodeCharacter
  1. tileset_env_test from #3115 (reduced version)
  2. tileset_env_test from #3115
  3. sample.md from #2839
  4. sample.md from #2839 (with inserted <script>)
  5. Firefox's new tab page (reduced version)
  6. Firefox's new tab page

My available test files created the same or even more complex highlighting (e.g. pattern highlight within regions in HTMLs) results. Most probably the logic isn't in a perfect shape yet, but definitely feasible as proof of concept thought.

Please help to test and improving it with a review. It took a lot of days to get this far and would be a shame when we didn't get this upstream in any form. :wink:

Fixes #2839 Fixes #3115 Closes #3242

JoeKar commented 7 months ago

Wouldn't that just hide a problem with the highlighter ?

Probably you're right here. Thank you for holding me back. I'll try to find a better solution, but it isn't that easy, since the constant.string isn't known yet and the identifier already started again here "M [...] till the end of the given section, which is then behind the next sibling region-end ([...] z"). Looks like I need a further stage storing and modifying the siblings, before they're highlighted.

Update: I implemented such an intermediate step and can now remove already captured groups and their childs.

JoeKar commented 7 months ago

Can you give it one more shot with the last version. It was a pain, but the behavior looks quite promising now.

dmaluka commented 7 months ago

I'll take a look as soon as I have time. I'm really interested, but I'm really busy lately.

JoeKar commented 7 months ago

The moment you're happy about what you've done and then (javascript)... grafik ...look at the "DOH". :( It's invalidated by the previous 's find out [...] invalidated by the previous Does this [...] and to be honest, I've no clue how this should be solved now. It doesn't need to be invalidated/removed since the invalidating/removing region has no validity, since it's invalidated/removed too and so the last string should still keep his validity. :thinking: This has become very crazy now.

JoeKar commented 7 months ago

grafik

Hopefully the time spent wasn't for nothing. :wink:

dmaluka commented 6 months ago

Just spotted a couple of examples of plainly and utterly incorrect XML highlighting with the newest micro (without this PR):

image image

With this PR, both are highlighted correctly.

So, I still had no time to review this PR, but so far it feels like reworking the highlighting algo from scratch was the right thing to do.

dustdfg commented 6 months ago

I started to delete trailing spaces in syntax yaml files and found one strange thing. Honestly I didn't check with this PR but it seems to me that PR of @dmaluka to highlight spaces/tabs (there is highlighted tab character) don't use highlight system... so I think it can be interesting to both of you @JoeKar @dmaluka

screen-1710948261

The same is here:

screen-1710948772

dmaluka commented 6 months ago

Yes, hltaberrors is independent of syntax highlighting. I also see it highlighting those raw tabs in crystal.yaml and erb.yaml, which is expected when tabstospaces is enabled. It can be fixed by replacing those raw tab characters with \\t.

dustdfg commented 6 months ago

I thought woudln't it be good to auto test highlighting? And found one thing internal/screen/screen.go:202:func InitSimScreen() (tcell.SimulationScreen, error) { tcell support fiction screen for testing.... I am not sure it is suitable but maybe it would help to auto catch some known possible bugs?

dmaluka commented 6 months ago

Yeah, I was also thinking about adding some auto tests for highlighting. I guess the most non-trivial thing here would be not implementing the test itself but choosing which example files, and for which subset of filetypes, to use as good test cases for highlighting (for covering various corner cases like heavily nested regions and so on).

JoeKar commented 6 months ago

(Unit) Testing will definitely help. As suggested by @dmaluka we can use some complex input, feed the highlighter with one or more syntax definitions and check his set states and matches. Especially the latter one is used 1:1 as output for coloring according the syntax definition. AFAIK in this case we wouldn't need the screen at all.