Improve performance of pair things

noctuid commented 5 years ago

The initial/experimental implementation of a pair definer that works with regexps and ignores unmatched pairs in strings and aggregated comments (i.e. subsequent line comments that start at the same indentation/column and are separated by only whitespace are considered a single comment) is working but slow.

Getting the bounds of things-aggregated-comment is x10 slower than things-string and is responsible for the majority of the time spent getting the bounds of a pair (specifically because checking for block comments involves repeatedly calling syntax-ppss at the moment).

Here are some potential optimizations:

[ ] Require things-seek-op to work as-is/consistently move to the next/previous thing (instead of having to call the bounds function to check if still at the same thing after running it); this would eliminate the need to do any expensive calculation when collecting avy overlay positions for remote selection. This probably makes sense even ignoring performance issues.
[ ] Ignore pairs in all strings/comments when getting the bounds if the point does not start in a string or comment. If a pair is not found in the string/comment the point starts in, skip over all strings/comments after that.
[x] Skip of out of strings/comments immediately if there is an unmatched pair.
[x] At beginning of up-pair function, abort if at top-level/nesting level 0 (no pair at point) as determined by syntax-ppss (temporarily modified syntax table like evil does). This optimization would only be possible if the pair has a single character open and close. CANCELED: Even with a single characte open/close, syntax-ppss does not take into account all the same edge cases and would fail to work the same
[ ] The aggregated comment implementation could be potentially be improved by creating a bounds function that only does anything when inside a comment (to prevent the need to call forward-op when not in a comment).
[ ] The aggregated comment implementation could potentially be improved by checking whether bordering comments are have the same type of comment starter instead of checking whether they are block comments (i.e. don't include comments with different comment starter types and potentially characters in the bounds); this would eliminate the need for the repeated calls to syntax-ppss.
[ ] Relying entirely on faces for calculating string/comment bounds would prevent the need to call syntax-ppss. I'd rather not rely on faces at all for reliability, but newcomment.el (and other comment-related packages like nerdcommenter) itself relies on faces, and the current things-aggregated-comment implementation already has to rely on faces for some cases. I'm not actually sure this would be faster either.''
[ ] Don't even bother checking comment bounds if in a string
[ ] Possibly abort in up-pair function if a certain distance past the starting point (assume top-level) instead of searching through the rest of the buffer

Edit: The aggregated comment thing is only useful when already inside a comment since comments can be ignored entirely if not starting in one. A non-aggregated comment thing will likely be a lot more efficient.

[ ] Use non-aggregated comment thing whenever possible

Some more benchmarking of the aggregated comment thing may be necessary. syntax-ppss is only an issue because it ends up being called a zillion times (because getting the aggregated comment bounds calls it for every subsequent comment, getting the comment bounds happens for every matched pair when getting the pair bounds, seeking involves getting the pair bounds, getting the remote bounds involves a lot of seeking, etc.). Still, (things-base-bounds 'things-string) should be called a similar number of times, and it doesn't seem to be impacting performance.

The speed of finding all visible pair positions won't be an issue after the seeking implementation is updated. I'm not sure if getting the pair bounds will still be an issue in deeply nested pairs or at the top-level when not at a pair (and pair is multi-char).

If performance issues persist, it may be necessary to just do what evil does and use up-list for single-char pairs. The downside is the dumber behavior (e.g. not working with a pair that spans lines in an elisp comment), regexp/multi-chars pairs being dumb or slow (either will be inconsistent with single-char pairs), and having to maintain multiple implementations.

noctuid commented 5 years ago

Tested (things-base-bounds 'things-paren) in a buffer with just (defun lispy--insert ... with the point on the first let ((l|et...)).

Originally took on average ~0.4-0.5 seconds for one run. Checking things--in-comment-p before trying to get the comment bounds cut that in half to ~0.22 (no comments in the buffer). Removing the things--in-comment-p check in things--bounds-string reduces this to ~0.1-0.2.

re-search-forward and re-search-forward are negligible. The issue still seems to be checking whether every matched opening/closing is in a string/comment. I need to look into whether there's a quick way to skip over strings and comments. In some cases it may be quicker to find all strings and comments beforehand, but that would probably be a lot slower for most cases. It's probably worth at least seeing how well relying on faces works (work done once instead of checking every point).

Running once:

things-base-bounds                         1122        0.1293517320  0.0001152867
things--bounds-of-pair-at-point            1           0.088865616   0.088865616
things--up-pair                            1           0.088372506   0.088372506
things--next-regexp                        1120        0.0832645980  7.434...e-05
syntax-ppss                                2233        0.0597856630  2.677...e-05
things--next-close                         559         0.0423453659  7.575...e-05
things--next-open                          559         0.0417573269  7.470...e-05
things--bounds-string                      1121        0.0376442970  3.358...e-05
things--unmatched-closing-p                560         0.0366798949  6.549...e-05
things--unmatched-opening-p                561         0.0362075890  6.454...e-05
things--in-comment-p                       1067        0.0294264540  2.757...e-05
things--looking-back                       2227        0.0072889869  3.273...e-06
things--at-escaped-character-p             1129        0.0049802749  4.411...e-06
re-search-backward                         2249        0.0028434200  1.264...e-06
re-search-forward                          1332        0.001699921   1.276...e-06
things--match-beginning                    1119        0.0013889889  1.241...e-06
things--base-thing                         1122        0.0013287399  1.184...e-06
things--match-end                          561         0.0006815879  1.214...e-06
things--adjusted-thing-p                   1122        0.0004030890  3.592...e-07
things--backward-up-pair                   1           0.000297723   0.000297723
things--previous-open                      1           0.000157599   0.000157599
things--previous-close                     1           8.2302e-05    8.2302e-05

Running 20 times:

things-base-bounds                         22440       3.2397873420  0.0001443755
things--bounds-of-pair-at-point            20          2.382656888   0.1191328444
things--up-pair                            20          2.3786571609  0.1189328580
things--next-regexp                        22400       2.0719318580  9.249...e-05
syntax-ppss                                44477       1.1137783399  2.504...e-05
things--next-open                          11180       1.0941373260  9.786...e-05
things--next-close                         11180       0.9965590750  8.913...e-05
things--unmatched-opening-p                11220       0.9900242390  8.823...e-05
things--bounds-string                      22420       0.6991029730  3.118...e-05
things--unmatched-closing-p                11200       0.6801817449  6.073...e-05
things--in-comment-p                       21340       0.5476249699  2.566...e-05
things--looking-back                       44540       0.2403565590  5.396...e-06
things--at-escaped-character-p             22580       0.1978629829  8.762...e-06
things--match-beginning                    22380       0.1299376189  5.805...e-06
things--base-thing                         22440       0.1299000500  5.788...e-06
re-search-backward                         45016       0.0530604130  1.178...e-06
re-search-forward                          27608       0.0326566570  1.182...e-06
things--match-end                          11220       0.0121156429  1.079...e-06
things--adjusted-thing-p                   22440       0.0074794799  3.333...e-07
things--backward-up-pair                   20          0.0015858800  7.929...e-05
things--previous-open                      20          0.000663173   3.315865e-05
things--previous-close                     20          0.0006272559  3.136...e-05

noctuid commented 5 years ago

Since supporting only a single character for pairs is much more performant, easy to implement, and covers the majority of cases, I've gone with it for the initial things-define-pair. I will look more into improving the regexp implementation in the future.

noctuid commented 5 years ago

In the future, smart behavior will be done in a generic way using constraints. Performance concerns still apply and most of the potential optimizations here are still relevant.

noctuid / things.el

Improve performance of pair things #1