noctuid / things.el

Extensions to thingatpt.el
51 stars 0 forks source link

Improve performance of pair things #1

Open noctuid opened 5 years ago

noctuid commented 5 years ago

The initial/experimental implementation of a pair definer that works with regexps and ignores unmatched pairs in strings and aggregated comments (i.e. subsequent line comments that start at the same indentation/column and are separated by only whitespace are considered a single comment) is working but slow.

Getting the bounds of things-aggregated-comment is x10 slower than things-string and is responsible for the majority of the time spent getting the bounds of a pair (specifically because checking for block comments involves repeatedly calling syntax-ppss at the moment).

Here are some potential optimizations:

Edit: The aggregated comment thing is only useful when already inside a comment since comments can be ignored entirely if not starting in one. A non-aggregated comment thing will likely be a lot more efficient.

Some more benchmarking of the aggregated comment thing may be necessary. syntax-ppss is only an issue because it ends up being called a zillion times (because getting the aggregated comment bounds calls it for every subsequent comment, getting the comment bounds happens for every matched pair when getting the pair bounds, seeking involves getting the pair bounds, getting the remote bounds involves a lot of seeking, etc.). Still, (things-base-bounds 'things-string) should be called a similar number of times, and it doesn't seem to be impacting performance.

The speed of finding all visible pair positions won't be an issue after the seeking implementation is updated. I'm not sure if getting the pair bounds will still be an issue in deeply nested pairs or at the top-level when not at a pair (and pair is multi-char).

If performance issues persist, it may be necessary to just do what evil does and use up-list for single-char pairs. The downside is the dumber behavior (e.g. not working with a pair that spans lines in an elisp comment), regexp/multi-chars pairs being dumb or slow (either will be inconsistent with single-char pairs), and having to maintain multiple implementations.

noctuid commented 5 years ago

Tested (things-base-bounds 'things-paren) in a buffer with just (defun lispy--insert ... with the point on the first let ((l|et...)).

Originally took on average ~0.4-0.5 seconds for one run. Checking things--in-comment-p before trying to get the comment bounds cut that in half to ~0.22 (no comments in the buffer). Removing the things--in-comment-p check in things--bounds-string reduces this to ~0.1-0.2.

re-search-forward and re-search-forward are negligible. The issue still seems to be checking whether every matched opening/closing is in a string/comment. I need to look into whether there's a quick way to skip over strings and comments. In some cases it may be quicker to find all strings and comments beforehand, but that would probably be a lot slower for most cases. It's probably worth at least seeing how well relying on faces works (work done once instead of checking every point).

Running once:

things-base-bounds                         1122        0.1293517320  0.0001152867
things--bounds-of-pair-at-point            1           0.088865616   0.088865616
things--up-pair                            1           0.088372506   0.088372506
things--next-regexp                        1120        0.0832645980  7.434...e-05
syntax-ppss                                2233        0.0597856630  2.677...e-05
things--next-close                         559         0.0423453659  7.575...e-05
things--next-open                          559         0.0417573269  7.470...e-05
things--bounds-string                      1121        0.0376442970  3.358...e-05
things--unmatched-closing-p                560         0.0366798949  6.549...e-05
things--unmatched-opening-p                561         0.0362075890  6.454...e-05
things--in-comment-p                       1067        0.0294264540  2.757...e-05
things--looking-back                       2227        0.0072889869  3.273...e-06
things--at-escaped-character-p             1129        0.0049802749  4.411...e-06
re-search-backward                         2249        0.0028434200  1.264...e-06
re-search-forward                          1332        0.001699921   1.276...e-06
things--match-beginning                    1119        0.0013889889  1.241...e-06
things--base-thing                         1122        0.0013287399  1.184...e-06
things--match-end                          561         0.0006815879  1.214...e-06
things--adjusted-thing-p                   1122        0.0004030890  3.592...e-07
things--backward-up-pair                   1           0.000297723   0.000297723
things--previous-open                      1           0.000157599   0.000157599
things--previous-close                     1           8.2302e-05    8.2302e-05

Running 20 times:

things-base-bounds                         22440       3.2397873420  0.0001443755
things--bounds-of-pair-at-point            20          2.382656888   0.1191328444
things--up-pair                            20          2.3786571609  0.1189328580
things--next-regexp                        22400       2.0719318580  9.249...e-05
syntax-ppss                                44477       1.1137783399  2.504...e-05
things--next-open                          11180       1.0941373260  9.786...e-05
things--next-close                         11180       0.9965590750  8.913...e-05
things--unmatched-opening-p                11220       0.9900242390  8.823...e-05
things--bounds-string                      22420       0.6991029730  3.118...e-05
things--unmatched-closing-p                11200       0.6801817449  6.073...e-05
things--in-comment-p                       21340       0.5476249699  2.566...e-05
things--looking-back                       44540       0.2403565590  5.396...e-06
things--at-escaped-character-p             22580       0.1978629829  8.762...e-06
things--match-beginning                    22380       0.1299376189  5.805...e-06
things--base-thing                         22440       0.1299000500  5.788...e-06
re-search-backward                         45016       0.0530604130  1.178...e-06
re-search-forward                          27608       0.0326566570  1.182...e-06
things--match-end                          11220       0.0121156429  1.079...e-06
things--adjusted-thing-p                   22440       0.0074794799  3.333...e-07
things--backward-up-pair                   20          0.0015858800  7.929...e-05
things--previous-open                      20          0.000663173   3.315865e-05
things--previous-close                     20          0.0006272559  3.136...e-05
noctuid commented 5 years ago

Since supporting only a single character for pairs is much more performant, easy to implement, and covers the majority of cases, I've gone with it for the initial things-define-pair. I will look more into improving the regexp implementation in the future.

noctuid commented 5 years ago

In the future, smart behavior will be done in a generic way using constraints. Performance concerns still apply and most of the potential optimizations here are still relevant.