Open slonoed opened 7 years ago
Hi @slonoed,
Could you provide an example file so I can confirm locally? This is the first I've heard of a serious performance issue with the syntax file.
The clojureSymbol regex is consistently the top item for me too when running with :syntime on
.
It is particularly bad if I have a long line of text and/or s-expressions, and try and delete a few characters using x
, especially if deleting an opening or closing double-quote on a string. There is perceptible lag, and if I hold the key down, i cannot predict how much text will be deleted - usually too much. IOW it is virtually unusable.
Switching syntax off and deleting the same code fragment with x
works absolutely fine.
@rm-hull
I do believe that you and @slonoed are having issues, but I don't see them locally. I enjoy debugging performance issues, but I won't be able to do that without a reproducible example.
For instance,
It is particularly bad if I have a long line of text and/or s-expressions, and try and delete a few characters using x, especially if deleting an opening or closing double-quote on a string.
On my machine, this operation is quite snappy, so I can only guess at what might be the problem.
I'm also seeing some typing and scrolling delays lately. I'm using neovim. I opened this file https://github.com/http-kit/http-kit/blob/master/src/org/httpkit/client.clj and set syntime on
and scrolled up and down several times.
clojureSymbol was easily the slowest call in the report.
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
7.938665 651229 631878 0.000352 0.000012 clojureSymbol \v%([a-zA-Z!$&_+=|<.>?-]|[^\x00-\x7F])+%(:?%([a-zA-Z0-9!#$%&_+=|'<.>/?-]|[^\x00-\x7F]))[#:]@<!
1.513363 622798 568704 0.000057 0.000002 clojureError ]|}|)
0.575973 160840 24230 0.000035 0.000004 clojureNumber \v<[-+]?%(0|[1-9]\d|%(0|[1-9]\d).\d)%(M|[eE][-+]?\d+)?>
0.511449 142312 0 0.000037 0.000004 clojureNumber \v<[-+]?%(0|[1-9]\d)/%(0|[1-9]\d)>
0.459769 142312 0 0.000034 0.000003 clojureNumber \v\c<[-+]?36r[0123456789abcdefghijklmnopqrstuvwxyz]+>
0.448722 223648 86611 0.000051 0.000002 clojureKeyword \v<:{1,2}%([^ \n\r\t()[]{}";@^~\\%/]+/)*[^ \n\r\t()\[\]{}";@^
~\%/]+:@<!>
0.425358 150640 14030 0.000030 0.000003 clojureNumber \v<[-+]?%(0\o|0x\x+|[1-9]\d)N?>
Thanks @morrifeldman
Performance is still pretty good on my machine for the linked file, but I can see how scrolling would feel slow on a laptop.
I'll have a stab at optimizing the syntax file today.
Removing the 35 clojureNumber passes for each possible custom radix halves the syntax matching time overall.
Please try the following issue branch:
https://github.com/guns/vim-clojure-static/tree/issue-77
You can use the new syntax benchmark script clj/bin/syntime
to compare syntax matching performance between HEAD and issue-77:
$ cd vim-clojure-static/clj/bin/
$ ./syntime path-to-my-clojure-file.clj
$ ls
syntime report-2017-02-07-21-21-48.log
Hi, dwelling on:
syntax match clojureSymbol "\v%([a-zA-Z!$&*_+=|<.>?-]|[^\x00-\x7F])+%(:?%([a-zA-Z0-9!#$%&*_+=|'<.>/?-]|[^\x00-\x7F]))*[#:]@<!"
I'm familiar with regexes in general, but not the vim-flavour specifically - and have a couple of questions:
\v
do?%(
and )
is a grouping? [^0\x00-\x7F]
was for? unicode matching, maybe?If I remove the [^0\x00-\x7F]
, this means the grouping can go, and replace a-zA-Z
with \a
and 0-9
with \d
,
syntax match clojureSymbol "\v[\a!$&*_+=|<.>?-]+:?[\a\d!#$%&*_+=|'<.>/?-]*[#:]@<!"
(not claiming this is valid or exactly equivalent, but you get the idea - is this viable?)
slowest 59 vs 928 average 5 vs 26
The (snipped) syntime comes in as:
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.010329 2030 1931 0.000054 0.000005 clojureError ]\|}\|)
0.009994 1642 1325 0.000059 0.000006 clojureSymbol \v[\a!$&*_+=|<.>?-]+:?[\a\d!#$
0.003215 451 39 0.000025 0.000007 clojureNumber \v<[-+]?%(0|[1-9]\d*|%(0|[1-9]
0.003125 422 0 0.000101 0.000007 clojureNumber \v<[-+]?%(0|[1-9]\d*)/%(0|[1-9
0.002954 422 0 0.000029 0.000007 clojureNumber \v<[-+]?%([2-9]|[1-2][0-9]|3[0
0.002936 434 22 0.000033 0.000007 clojureNumber \v<[-+]?%(0\o*|0x\x+|[1-9]\d*)
0.001555 600 193 0.000055 0.000003 clojureKeyword \v<:{1,2}%([^ \n\r\t()\[\]{}";
vs previously:
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.048174 1846 1811 0.000928 0.000026 clojureSymbol \v[a-zA-Z!$&*_+=|<.>?-]+:?[a-z
0.008964 1802 1703 0.000044 0.000005 clojureError ]\|}\|)
0.003346 450 38 0.000035 0.000007 clojureNumber \v<[-+]?%(0|[1-9]\d*|%(0|[1-9]
0.003236 422 0 0.000067 0.000008 clojureNumber \v<[-+]?%(0|[1-9]\d*)/%(0|[1-9
0.002945 422 0 0.000030 0.000007 clojureNumber \v<[-+]?%([2-9]|[1-2][0-9]|3[0
0.002905 434 22 0.000051 0.000007 clojureNumber \v<[-+]?%(0\o*|0x\x+|[1-9]\d*)
0.001605 602 195 0.000039 0.000003 clojureKeyword \v<:{1,2}%([^ \n\r\t()\[\]{}";
What does the initial \v do?
Makes the pattern "very-magic", so closer to PCRE. See :help \v
I assume that %( and ) is a grouping?
It is non capturing grouping, like (?:)
. See :help \%(
I couldn't quite fathom what [^0\x00-\x7F] was for? unicode matching, maybe?
Yes. See 712be9ae0ab4d576d6ce361d3b5982ed7a5915cb
and replace a-zA-Z with \a and 0-9 with \d,
Using the builtin char classes is a good idea. I'll have a look now.
Urgh, I was getting frustrated searching google for for details on \v, %(, etc ... I didn't even consider the built-in docs... Oh well :)
Is there a \u (or equivalent) that can be used instead of [^\x00-\x7F] for unicode matching?
@rm-hull
syntax match clojureSymbol "\v[\a!$&*_+=|<.>?-]+:?[\a\d!#$%&*_+=|'<.>/?-]*[#:]@<!"
(not claiming this is valid or exactly equivalent, but you get the idea - is this viable?)
Is there a \u (or equivalent) that can be used instead of [^\x00-\x7F] for unicode matching?
Removing the multibyte character check and using character classes in place of character ranges produces the following pattern:
syntax match clojureSymbol "\v[[:alpha:]!$&*_+=|<.>?-]+%(:?[[:alnum:]!#$%&*_+=|'<.>/?-])*[#:]@<!"
Note that [\a]
doesn't work in Vim; it matches two characters, \
and a
. Instead, [[:alpha:]]
is equivalent to [a-zA-Z]
(ASCII only)
Using the new syntime
script, I get:
$ syntime -p original clj/src/vim_clojure_static/generate.clj
$ cat original-1486672276.830991108.log
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.031737 2003 1906 0.000092 0.000016 clojureSymbol \v%([a-zA-Z!$&*_+=|<.>?-]|[^\x00-\x7F])+%(:?%([a-zA-Z0-9!#$%&*_+=|'<.>/?-]|[^\x00-\x7F]))*[#:]@<!
…
0.118116 55103
$ syntime -p classes clj/src/vim_clojure_static/generate.clj
$ cat classes-1486672545.445051032.log
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.031080 2003 1906 0.000106 0.000016 clojureSymbol \v[[:alpha:]!$&*_+=|<.>?-]+%(:?[[:alnum:]!#$%&*_+=|'<.>/?-])*[#:]@<!
…
0.118896 55103
As you can see, there is no discernible performance difference after removing the multibyte check and replacing character ranges with character classes.
Furthermore, removing the clojureSymbol
pattern altogether would only reduce the running time to (0.118116 - 0.031737)/0.118116 =~ 73%
of current performance.
In contrast, the changes in the issue77 branch reduce the running time to 43% of current performance without changing the clojureSymbol
pattern. I hope to bring this benchmark down to 33% before merging into master.
I've been testing syntime with client.clj (from https://github.com/http-kit/http-kit/blob/master/src/org/httpkit/client.clj) and this clearly taxes the clojureSymbol
more than the generate.clj file - it is accounting for ~65% (= 0.051640 / 0.079156) of the total time:
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.051640 1846 1811 0.000820 0.000028 clojureSymbol \v%([a-zA-Z!$&*_+=|<.>?-]|[^\x
0.009280 1802 1703 0.000066 0.000005 clojureError ]\|}\|)
0.003438 450 38 0.000090 0.000008 clojureNumber \v<[-+]?%(0|[1-9]\d*|%(0|[1-9]
0.003393 422 0 0.000083 0.000008 clojureNumber \v<[-+]?%(0|[1-9]\d*)/%(0|[1-9
0.003007 422 0 0.000039 0.000007 clojureNumber \v<[-+]?%([2-9]|[1-2][0-9]|3[0
0.003005 434 22 0.000047 0.000007 clojureNumber \v<[-+]?%(0\o*|0x\x+|[1-9]\d*)
0.001485 602 195 0.000036 0.000002 clojureKeyword \v<:{1,2}%([^ \n\r\t()\[\]{}";
0.000567 2753 2327 0.000007 0.000000 clojureSexp )
0.000266 432 29 0.000017 0.000001 clojureComment ;.*$
0.000220 1205 867 0.000002 0.000000 clojureSexp (
0.000166 844 455 0.000001 0.000000 clojureVector \[
0.000133 571 164 0.000001 0.000000 clojureMap {
0.000131 422 0 0.000008 0.000000 clojureCharacter \\o\%([0-3]\o\{2\}\|\o\{1,2\}\
0.000127 454 32 0.000020 0.000000 clojureUnquote \~
0.000122 442 28 0.000003 0.000000 clojureCharacter \\.
0.000117 422 0 0.000001 0.000000 clojureCharacter \\formfeed
0.000111 422 0 0.000004 0.000000 clojureCharacter \\u\x\{4\}
0.000111 633 534 0.000002 0.000000 clojureVector ]
0.000109 422 0 0.000008 0.000000 clojureCharacter \\tab
0.000108 425 3 0.000010 0.000000 clojureDeref @
0.000107 452 32 0.000001 0.000000 clojureQuote '
0.000106 423 2 0.000004 0.000000 clojureAnonArg %\(20\|1\d\|[1-9]\|&\)\?
0.000105 574 200 0.000001 0.000000 clojureString "
0.000103 422 0 0.000009 0.000000 clojureComment #!.*$
0.000102 438 16 0.000001 0.000000 clojureVarArg &
0.000101 422 0 0.000002 0.000000 clojureUnquote \~@
0.000101 432 11 0.000001 0.000000 clojureMeta \^
0.000101 423 2 0.000001 0.000000 clojureQuote `
0.000099 422 0 0.000001 0.000000 clojureCharacter \\newline
0.000096 422 0 0.000001 0.000000 clojureCharacter \\return
0.000095 422 0 0.000001 0.000000 clojureCharacter \\space
0.000092 422 0 0.000001 0.000000 clojureCharacter \\backspace
0.000085 428 6 0.000001 0.000000 clojureDispatch \v#[\^'=<_]?
0.000083 422 0 0.000001 0.000000 clojureRegexp \#"
0.000079 217 30 0.000004 0.000000 clojureString \\\\\|\\"
0.000057 277 217 0.000002 0.000000 clojureString "
0.000056 300 237 0.000001 0.000000 clojureMap }
0.000052 129 20 0.000008 0.000000 clojureStringEscape \v\\%([\\btnfr"]|u\x{4}|[0-3]\
0.079156 22472
Removing it entirely, the total time drops from 0.79156 to 0.026007:
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.008047 1412 1313 0.000056 0.000006 clojureError ]\|}\|)
0.003337 444 32 0.000043 0.000008 clojureNumber \v<[-+]?%(0|[1-9]\d*|%(0|[1-9]
0.003158 422 0 0.000032 0.000007 clojureNumber \v<[-+]?%(0|[1-9]\d*)/%(0|[1-9
0.003127 422 0 0.000053 0.000007 clojureNumber \v<[-+]?%([2-9]|[1-2][0-9]|3[0
0.003044 434 22 0.000035 0.000007 clojureNumber \v<[-+]?%(0\o*|0x\x+|[1-9]\d*)
0.001529 599 192 0.000065 0.000003 clojureKeyword \v<:{1,2}%([^ \n\r\t()\[\]{}";
0.000508 2247 1871 0.000004 0.000000 clojureSexp )
0.000245 430 27 0.000017 0.000001 clojureComment ;.*$
0.000179 1099 761 0.000001 0.000000 clojureSexp (
0.000175 778 389 0.000001 0.000000 clojureVector \[
0.000138 427 5 0.000020 0.000000 clojureDispatch \v#[\^'=<_]?
0.000132 442 28 0.000002 0.000000 clojureCharacter \\.
0.000120 578 204 0.000001 0.000000 clojureString "
0.000114 422 0 0.000006 0.000000 clojureCharacter \\formfeed
0.000114 422 0 0.000009 0.000000 clojureCharacter \\space
0.000113 422 0 0.000007 0.000000 clojureCharacter \\o\%([0-3]\o\{2\}\|\o\{1,2\}\
0.000113 538 131 0.000001 0.000000 clojureMap {
0.000111 422 0 0.000009 0.000000 clojureCharacter \\return
0.000111 433 11 0.000001 0.000000 clojureVarArg &
0.000111 422 0 0.000004 0.000000 clojureCharacter \\u\x\{4\}
0.000107 422 0 0.000001 0.000000 clojureUnquote \~@
0.000105 422 0 0.000001 0.000000 clojureComment #!.*$
0.000102 423 2 0.000001 0.000000 clojureQuote `
0.000102 422 0 0.000001 0.000000 clojureCharacter \\backspace
0.000102 423 2 0.000003 0.000000 clojureAnonArg %\(20\|1\d\|[1-9]\|&\)\?
0.000097 451 29 0.000001 0.000000 clojureUnquote \~
0.000096 422 0 0.000001 0.000000 clojureCharacter \\newline
0.000091 422 0 0.000004 0.000000 clojureCharacter \\tab
0.000091 449 29 0.000002 0.000000 clojureQuote '
0.000088 422 0 0.000001 0.000000 clojureRegexp \#"
0.000088 430 9 0.000001 0.000000 clojureMeta \^
0.000084 217 30 0.000007 0.000000 clojureString \\\\\|\\"
0.000079 407 343 0.000006 0.000000 clojureVector ]
0.000075 424 2 0.000001 0.000000 clojureDeref @
0.000066 266 219 0.000001 0.000000 clojureMap }
0.000064 277 217 0.000001 0.000000 clojureString "
0.000044 129 20 0.000008 0.000000 clojureStringEscape \v\\%([\\btnfr"]|u\x{4}|[0-3]\
0.026007 19243
Where I put the carat indicator:
syntax match clojureSymbol "\v%([a-zA-Z!$&*_+=|<.>?-]|[^\x00-\x7F])+%(:?%([a-zA-Z0-9!#$%&*_+=|'<.>/?-]|[^\x00-\x7F]))*[#:]@<!"
^
does this need to be +
(one or more) - can the quantifier be removed, so it matches exactly once?
similarly, can the :?
be blended into [a-zA-Z0-9!#$%&*_+=|'<.>/?-]
?
Is it just that it is doing unnecessary backtracking?
Actually, it seems like @<!
may be the culprit ... from the vim help:
\@<!
Matches with zero width if the preceding atom does NOT match just before what follows. Thus this matches if there is no position in the current or previous line where the atom matches such that it ends just before what follows. |/zero-width| {not in Vi} Like "(?<!pattern)" in Perl, but Vim allows non-fixed-width patterns. The match with the preceding atom is made to end just before the match with what follows, thus an atom that ends in ".*" will work. Warning: This can be slow (because many positions need to be checked for a match). Use a limit if you can, see below. Example matches ~ (foo)\@<!bar any "bar" that's not in "foobar" (\/\/.*)\@<!in "in" which is not after "//"
Without knowing the full impact of removing @<!
from the end of the regex, the total time is less than half of previous
TOTAL COUNT MATCH SLOWEST AVERAGE NAME PATTERN
0.008035 423 4 0.000119 0.000019 clojureSymbol \v%([a-zA-Z!$&*_+=|<.>?-]|[^\x
0.007553 1412 1313 0.000082 0.000005 clojureError ]\|}\|)
0.003158 444 32 0.000070 0.000007 clojureNumber \v<[-+]?%(0|[1-9]\d*|%(0|[1-9]
0.003086 422 0 0.000071 0.000007 clojureNumber \v<[-+]?%(0|[1-9]\d*)/%(0|[1-9
0.003008 422 0 0.000117 0.000007 clojureNumber \v<[-+]?%([2-9]|[1-2][0-9]|3[0
0.002825 434 22 0.000028 0.000007 clojureNumber \v<[-+]?%(0\o*|0x\x+|[1-9]\d*)
0.001464 599 192 0.000035 0.000002 clojureKeyword \v<:{1,2}%([^ \n\r\t()\[\]{}";
0.000481 2247 1871 0.000013 0.000000 clojureSexp )
0.000235 430 27 0.000018 0.000001 clojureComment ;.*$
0.000162 1099 761 0.000001 0.000000 clojureSexp (
0.000138 538 131 0.000001 0.000000 clojureMap {
0.000138 778 389 0.000001 0.000000 clojureVector \[
0.000116 578 204 0.000001 0.000000 clojureString "
0.000114 422 0 0.000003 0.000000 clojureCharacter \\o\%([0-3]\o\{2\}\|\o\{1,2\}\
0.000110 442 28 0.000004 0.000000 clojureCharacter \\.
0.000109 422 0 0.000002 0.000000 clojureCharacter \\tab
0.000104 423 2 0.000004 0.000000 clojureAnonArg %\(20\|1\d\|[1-9]\|&\)\?
0.000104 422 0 0.000001 0.000000 clojureComment #!.*$
0.000102 433 11 0.000001 0.000000 clojureVarArg &
0.000099 422 0 0.000001 0.000000 clojureRegexp \#"
0.000099 451 29 0.000001 0.000000 clojureUnquote \~
0.000095 422 0 0.000001 0.000000 clojureUnquote \~@
0.000095 422 0 0.000001 0.000000 clojureCharacter \\backspace
0.000094 427 5 0.000004 0.000000 clojureDispatch \v#[\^'=<_]?
0.000091 430 9 0.000004 0.000000 clojureMeta \^
0.000090 422 0 0.000003 0.000000 clojureCharacter \\return
0.000090 422 0 0.000001 0.000000 clojureCharacter \\newline
0.000089 422 0 0.000001 0.000000 clojureCharacter \\formfeed
0.000088 423 2 0.000001 0.000000 clojureQuote `
0.000086 449 29 0.000001 0.000000 clojureQuote '
0.000085 422 0 0.000001 0.000000 clojureCharacter \\space
0.000083 217 30 0.000004 0.000000 clojureString \\\\\|\\"
0.000082 424 2 0.000001 0.000000 clojureDeref @
0.000076 407 343 0.000005 0.000000 clojureVector ]
0.000075 422 0 0.000002 0.000000 clojureCharacter \\u\x\{4\}
0.000071 266 219 0.000016 0.000000 clojureMap }
0.000064 129 20 0.000009 0.000000 clojureStringEscape \v\\%([\\btnfr"]|u\x{4}|[0-3]\
0.000045 277 217 0.000002 0.000000 clojureString "
0.032639 19666
@guns Oh, I should say all the ponderings/investigations of clojureSymbol
are with your issue77 branch changes applied
@rm-hull
Actually, it seems like @<! may be the culprit ... from the vim help:
Nice catch! Looks like @1<!
will give us the performance boost we want while keeping the intent of the pattern unchanged:
\@123<=
Like "\@<=" but only look back 123 bytes. This avoids trying lots
of matches that are known to fail and make executing the pattern very
slow. Example, check if there is a "<" just before "span":
/<\@1<=span
This will try matching "<" only one byte before "span", which is the
only place that works anyway.
Branch issue-77
now does syntax matching three times faster on average than master
. There are still a few small optimizations that can be made, and I am in the process of adding test cases for symbols.
@guns - any thoughts as to when that branch will get merged into master?
I too have noticed perf issues with Clojure syntax on a 2-month-old top-of-the-line MacBook Pro using regular vim.
Amazing work, looking forward to this being merged!
For those using neovim - does perf improve over regular vim? I'm thinking of switching.
Hi @guns, is https://github.com/guns/vim-clojure-static/tree/issue-77 ready to merge?
I hate to pile on, but I'd love to see some progress in this arena. Manually merging in some of the changes from issue-77 is great, but it's always nice to be official.
Looks like this plugin has issue with performance. I have delays when typing (up to 3 second). I disabled all plugins and found than clojureSymbol regex take most time when working with clojure file.
vim version output
My question is: can this regex be rewrited? Or this is already best solution?