xolox / vim-easytags

Automated tag file generation and syntax highlighting of tags in Vim
http://peterodding.com/code/vim/easytags/
1.01k stars 109 forks source link

fix slow highlighter on vim 7.4 #80

Open aktau opened 10 years ago

aktau commented 10 years ago

Uses "syntax keyword" instead of "keyword match" to find subjects to highlight, which is much faster. The problem is that the type of regex used by vim-easytags is a pathological bad case for the NFA regex engine which was introduced in vim 7.4.

As reported here: https://groups.google.com/forum/#!topic/vim_dev/cPcMap1BdQw

The patch was taken from the vim-easytags issues list on github, credit for it goes to @juliantaylor.

Fixes issue #68.

aktau commented 10 years ago

Meh, I've been using this and suddenly it's slow again, now I really don't know what's happening anymore. Perhaps I'll just wait until ZyX gets the VimL to Lua(JIT) translator working and see how it can be done entirely on the script side...

juliantaylor commented 10 years ago

the slow part is the regex engine, unless viml also exchanges that or reverts the default back to vims old engine (set re=1 I think) a JIT is not going to help

juliantaylor commented 10 years ago

you may be able to speed some things up by only searching for words in the displayed file instead of all, but the extra check is typically not worth it unless you have a ridiculously large tag database.

aktau commented 10 years ago

I also wonder why this even needs a regex. I haven't looked at the code in-depth but from the length of the regex it seems that it's loading in a big chunk of the tag database, and telling vim:

highlight all words that match regex('tag1|tag2|tag3|tag4|...|tag12381923892183")

It seems a bit extreme. Might be better to just keep a patricia trie of the tag database around (and periodically update it), and just loop over the entire document querying the patricia trie if the symbol is present. This is the sort of thing that would be easy squeezy in a really fast scripting language.

aktau commented 10 years ago

Disregard my earlier comment about it being slow again. Somehow I wasn't using my fork with the patch applied as I originally thought. So, with this patch applied (and python support enabled), vim is workable again!

Since neovim doesn't have python support yet, I can't edit my files with neovim, which is a real shame.

aktau commented 10 years ago

Turns out that applying the same thing to the VimL side of the story also has beneficial effects, of course ;).

aktau commented 10 years ago

@xolox I notice you've given your ole' project another look again, so I would like to request a comment about whether these 2 tiny commits are valid or not. They do seem to improve the responsiveness a lot and the changes are not nearly as invasive as the async branch. Speeding up the non-async codepath will make putting off async support much more bearable. As well as sparing my poor laptop more.

That would allow me to use xolox/vim-easytags again in vundle :).

By the way, I'm also a neovim contributor and I'm investigating some of the reasons why the vim codepath is slow (like find_tags). See this: https://github.com/neovim/neovim/issues/868

The only thing that's a bit of a pain and will be difficult to optimize directly is the regex syntax highlighting, but perhaps we'll be able to work around that.

By the way, the neovim job feature would be quite ideal for async support, I think. None of that clientserver horsing around. But I think you don't want to go down that route just yet, as it's not present in vanilla vim.

Since I can't live without this plugin, it's effectively stopping me from using the editor on which I work ;).

xolox commented 10 years ago

I would like to request a comment about whether these 2 tiny commits are valid or not

They are valid from your perspective (performance) but not from my perspective (e.g. you removed all references to pattern_prefix and pattern_suffix thereby dropping a feature). That's why I won't just merge this pull request.

On the other hand I'm open to thinking about performance improvements like using :syntax keyword. Maybe I'll make it configurable so that there's a choice between performance vs. functionality. It would be even better of course if vim-easytags could pick the most performant applicable syntax highlighting method automatically.

juliantaylor commented 10 years ago

there is also a way to choose the old regex engine for specific regexes only this was required for e.g. the yaml syntax highlighting, the syntax required can be found there

xolox commented 10 years ago

@juliantaylor: Thanks for the hint, I'll take a look. A hybrid approach does seem best to me.

xolox commented 10 years ago

Please try out version 3.6 (see b6f8757d004d5f4ef7280fd111a21821e6bee79a) and let me know how it works for you. By default it may still be slow, but take a look at the new g:easytags_syntax_keyword option. Note that 3.6 includes 3.5 which merged a huge feature branch, so here's hoping that works out well...

xolox commented 10 years ago

Forgot to mention one thing: Please note that right now this 'feature' is not integrated with the "accelerated Python syntax highlighting" feature, because I'm considering ripping that out and replacing it with a fast Vim script implementation (if I can build one :-).

aktau commented 10 years ago

For neovim, even when integrating the keyword fix, it still locks up too much (you wouldn't believe the # of calls to do_filter and stuff, from time to time I profile neovim while its running vim-easytags, which is the only script I have that can bring (neo)vim to its knees). The bottleneck was twofold:

1) the regex matching 2) taglist + processing

The first one was more or less eliminated by this patch, but the second one still exists.

For vanilla vim, which I use with python enabled, the regex bottleneck still exists (as you mention, it's not integrated), yet the taglist + processing slowness is not present.

So now both codepaths have exactly 1 bottleneck :). Do you have any idea about how you would rewrite the vim part (on a high level) to perform better?

When I get some more time for neovim I'm going to try to optimize some parts of the vimscript engine to make f_tagfiles run much faster, even on huge files.

xolox commented 10 years ago

@aktau: The number of strings/lists/dictionaries created and destroyed during a single :HighlightTags invocation is enormous and these are all heap allocations AFAIK, so it has never really surprised me that Vim has a hard time with this.

About optimizing, one thought came to mind: When I originally created the accelerated Python syntax highlighting I noticed how fast it is compared to the Vim script version while they mostly do the same thing. I can think of two reasons:

  1. The impression keeps growing on me that Vim script has really bad performance on reading files and manipulating (thousands of tiny) strings. This sounds completely silly for a mature text editor with its own programming language, but nevertheless my impression is there :-) and I've been writing Vim scripts for quite a few years now so it's not like I don't have any experience :-). Benchmarks will tell the truth.
  2. The Python version is using "shallow" parsing while the Vim script version is using taglist() which "deep" parses tags files. By sidestepping taglist() I would be giving up a code path that's presumably implemented in C and optimized for speed. On the other hand I would also sidestep the "deep" parsing. Could be worth a shot.

You specifically mention do_filter(), I assume this is the C implementation of the Vim script filter() function? That's interesting. I could try to inverse the logic in xolox#easytags#highlight() so that a single loop over all tags is performed. Right now there's a for loop inside of which filter() and map() are used.

Last but not least: A long time ago I created simple functions to accurately time code execution in a way that can be easily presented to the user in a human friendly format. Nothing is stopping me from using these functions in a more granular way so that we can easily pick apart :HighlightTags on a high level. I suppose this should be my next step, it shouldn't take more than an hour or so.

aktau commented 10 years ago

About optimizing, one thought came to mind: When I originally created the accelerated Python syntax highlighting I noticed how fast it is compared to the Vim script version while they mostly do the same thing. I can think of two reasons:

Exactly, the taglist slowness can be explained by the shallow vs. deep argument (my intuition tells me that's it), but I'm not 100% sure about the map/filter slowness. It might be the difference between a simple interpreter that constantly re-parses and evalutates everything (cough, viml, cough) and a bytecode interpreter (like python). It might be something else too.

You specifically mention do_filter(), I assume this is the C implementation of the Vim script filter() function?

Indeed. I might be misremembering, in which case its f_filter and not do_filter (they do different things), but it's that which shows up on profiling traces.

That's interesting. I could try to inverse the logic in xolox#easytags#highlight() so that a single loop over all tags is performed. Right now there's a for loop inside of which filter() and map() are used.

This (plus taglist) are easily the most time consuming parts. I'm not sure if the filter + map case can be sped up a lot on the C side (for some things, Vim is surprisingly well-optimized, for others not so much). I have it planned out in my TODO list. If at all, possible, I'd like to make neovim usable for me even before we merge in the LuaJIT interpreter: https://github.com/neovim/neovim/issues/392 (for some very preliminary first-cut speed results).

Last but not least: A long time ago I created simple functions to accurately time code execution in a way that can be easily presented to the user in a human friendly format. Nothing is stopping me from using these functions in a more granular way so that we can easily pick apart :HighlightTags on a high level. I suppose this should be my next step, it shouldn't take more than an hour or so.

A high-level view from the VimL side could help my diagnosis of the issue as well.