wincent / command-t

⌨️ Fast file navigation for Neovim and Vim
BSD 2-Clause "Simplified" License
2.74k stars 317 forks source link

perf: add experimental support for using mimalloc allocator #404

Open wincent opened 1 year ago

wincent commented 1 year ago

Vendoring from microsoft/mimalloc and specifically the v2.0.6 tag v2.1.7 tag.

mimalloc is a simple allocator focused on performance and it is easy to drop in as a replacement for malloc() and friends as described in its README. So as not to bring in a dependency on CMake, we just build the static.c version. Sadly, the performance delta (see numbers below) is not a clear win; the numbers are a bit all over the place. This probably isn't that surprising because most of the heavy memory allocation in Command-T is already micro-managed internally (but simply, with little overhead) using big slabs allocated with mmap(). Nevertheless, parking this here as a possible idea.

I added a script to pull down the release archive and dump it into a directory, because I don't want to use a submodule for this (people installing a Vim plugin from a Git repo shouldn't have to know/worry about whether it needs or uses submodules). Space on disk for this set of files (some of which are obviously redundant in our context) is:

du -sh lua/wincent/commandt/lib/vendor/github/microsoft
4.8M    lua/wincent/commandt/lib/vendor/github/microsoft

As it is not clear whether this is going to be a great idea or not, it only takes effect if you call make with USE_MIMALLOC set. You can verify that it actually is overriding the standard malloc() etc calls by running a command with MIMALLOC_VERBOSE, which will cause it to print some extra info out:

env MIMALLOC_VERBOSE=1 TIMES=1 bin/benchmarks/scanner.lua

Impact (unfortunately, a bit inconclusive) on scanner and matcher benchmarks follows. Note that numbers shouldn't be compared across machines because they were produced at different times (for example, the M3 numbers are from a different version of the OS, and the branch was rebased, compared with the other machines).

On mid-2015 MacBook Pro

These numbers are all over the map due to thermal throttling.

           best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
  buffer 0.04094 0.04178 0.00278 [-0.6%]        (0.04100) (0.04186) (0.00287) [-0.6%]
    file 0.30707 0.31436 0.02486 [-1.0%]   0.05 (0.30735) (0.31473) (0.02499) [-1.0%]  0.05
    find 0.05827 0.06678 0.01162 [+1.5%]   0.05 (0.92013) (0.93752) (0.04453) [-1.0%] 0.025
     git 0.05163 0.06000 0.01115 [+3.3%] 0.0005 (1.00993) (1.02469) (0.04072) [-0.7%] 0.025
      rg 0.06419 0.07229 0.01203 [+3.8%]  0.005 (1.61018) (1.66326) (0.08803) [+0.3%]
watchman 0.01095 0.01121 0.00068 [+0.2%]        (1.16830) (1.17605) (0.01835) [+0.6%] 0.005
   total 0.54387 0.56643 0.04391 [+0.4%]        (5.09873) (5.15811) (0.15328) [-0.1%]

                    best      avg      sd      +/-     p     (best)    (avg)      (sd)      +/-     p
     pathological  0.44648  0.48275 0.19826 [-10.0%]  0.01 (0.44705) (0.48350) (0.19793) [-10.0%]  0.01
        command-t  0.41205  0.44292 0.21658  [+3.8%] 0.005 (0.41255) (0.44364) (0.21681)  [+3.8%] 0.005
chromium (subset)  2.75724  2.99017 0.47925  [-1.3%]       (0.51232) (0.55960) (0.17228)  [-1.5%]
 chromium (whole)  3.18933  3.63241 0.64392  [-0.7%]       (0.41821) (0.49571) (0.14853)  [-0.3%]  0.05
       big (400k)  4.90155  5.51271 1.20748  [-1.0%]       (0.65297) (0.74723) (0.23045)  [-4.5%]  0.05
            total 11.74815 13.06097 2.16866  [-1.2%]       (2.47007) (2.72968) (0.54795)  [-2.8%] 0.025

M1 MacBook Pro

           best    avg      sd     +/-     p     (best)    (avg)      (sd)     +/-     p
  buffer 0.04407 0.05368 0.01123 [-1.4%] 0.025 (0.04433) (0.05413) (0.01150) [-1.6%] 0.025
    file 0.20902 0.21428 0.01060 [+1.0%]  0.01 (0.20902) (0.21511) (0.01219) [+1.1%] 0.005
    find 0.02687 0.03006 0.01015 [+3.9%]  0.05 (0.63141) (0.64156) (0.03483) [+0.7%]  0.05
     git 0.02693 0.02995 0.00980 [+2.2%]       (0.71734) (0.72825) (0.04266) [-0.4%]
      rg 0.02916 0.03318 0.01136 [+2.9%]       (0.90193) (0.91710) (0.07157) [+1.4%] 0.005
watchman 0.01100 0.01156 0.00165 [-0.7%]       (1.18802) (1.21274) (0.13422) [+1.5%] 0.005
   total 0.36119 0.37272 0.03632 [+1.1%]       (3.71713) (3.76889) (0.18577) [+0.9%] 0.005

                    best    avg      sd     +/-     p     (best)    (avg)      (sd)     +/-     p
     pathological 0.28526 0.29636 0.08356 [-4.0%] 0.025 (0.28527) (0.29647) (0.08343) [-4.0%] 0.025
        command-t 0.23759 0.24616 0.07356 [+1.6%]       (0.23760) (0.24618) (0.07354) [+1.6%]
chromium (subset) 1.56761 1.58469 0.03655 [-0.3%]       (0.41376) (0.42040) (0.02032) [-0.4%]
 chromium (whole) 1.87180 1.88726 0.06174 [-0.4%] 0.025 (0.31695) (0.32809) (0.03497) [+0.4%]
       big (400k) 2.90455 2.92204 0.07185 [-0.2%]       (0.48384) (0.50533) (0.07608) [-0.0%]
            total 6.88851 6.93650 0.15002 [-0.4%] 0.025 (1.74550) (1.79647) (0.14517) [-0.5%]

M3 MacBook Pro

           best    avg      sd      +/-      p     (best)    (avg)      (sd)      +/-      p
  buffer 0.01255 0.01400 0.00409  [+2.0%]        (0.01260) (0.01447) (0.00635)  [-3.3%]
    file 0.14749 0.15026 0.00629 [+38.1%] 0.0005 (0.14843) (0.15115) (0.00626) [+37.9%] 0.0005
    find 0.20783 0.27306 0.12796 [+15.8%] 0.0005 (1.13360) (1.38588) (0.55490) [+15.3%] 0.0005
     git 0.21748 0.25155 0.10398 [+13.0%] 0.0005 (1.17693) (1.40937) (0.54965)  [+9.1%] 0.0005
      rg 0.20640 0.26983 0.12977 [+12.2%] 0.0005 (1.55310) (1.78037) (0.55921)  [+6.9%] 0.0005
watchman 0.01813 0.01980 0.00287  [+6.1%] 0.0005 (1.19740) (1.21007) (0.02198)  [-0.2%]
   total 0.81542 0.97850 0.33560 [+17.1%] 0.0005 (5.23262) (5.95132) (1.66475)  [+8.7%] 0.0005

                    best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
     pathological 0.21079 0.22604 0.10943 [+4.8%]  0.025 (0.21107) (0.22640) (0.10972) [+4.7%] 0.025
        command-t 0.16694 0.17164 0.04923 [-0.6%]        (0.16716) (0.17228) (0.05253) [-0.5%]
chromium (subset) 1.35310 1.36239 0.02010 [+0.1%]        (0.28797) (0.29255) (0.01108) [+0.3%]
 chromium (whole) 1.11148 1.11599 0.01258 [+0.3%]   0.01 (0.12167) (0.12478) (0.00828) [-0.2%]
       big (400k) 1.67454 1.68249 0.05630 [+0.6%] 0.0005 (0.18195) (0.18487) (0.00876) [+0.0%]
            total 4.52863 4.55855 0.15573 [+0.5%]   0.01 (0.97644) (1.00087) (0.12712) [+1.0%]

Ryzen 5950X Arch Linux

           best    avg      sd     +/-   p   (best)    (avg)      (sd)      +/-     p
  buffer 0.02465 0.02544 0.01098 [-0.4%]   (0.02467) (0.02546) (0.01099)  [-0.5%]
    file 0.09906 0.09948 0.00124 [-0.1%]   (0.09943) (0.09995) (0.00130)  [-0.2%]
    find 0.01852 0.01885 0.00084 [+0.5%]   (0.25137) (0.25430) (0.00762)  [+0.1%]
     git 0.01718 0.01811 0.00210 [+0.6%]   (0.22095) (0.22468) (0.01156)  [-0.6%]
      rg 0.01748 0.01792 0.00105 [+0.5%]   (0.60575) (0.61077) (0.01562)  [-0.1%]
watchman 0.00178 0.00186 0.00033 [-5.6%]   (0.02282) (0.02717) (0.02826) [-11.5%]
   total 0.17975 0.18165 0.01018 [-0.0%]   (1.23025) (1.24233) (0.04061)  [-0.4%] 0.05

                    best    avg      sd     +/-      p     (best)    (avg)      (sd)      +/-      p
     pathological 0.26186 0.27703 0.10940 [-4.4%] 0.0005 (0.26196) (0.27715) (0.10946)  [-4.4%] 0.0005
        command-t 0.19271 0.20058 0.05044 [-3.0%] 0.0005 (0.19279) (0.20065) (0.05047)  [-3.0%] 0.0005
chromium (subset) 1.83627 1.89158 0.25631 [-3.8%]   0.01 (0.45977) (0.49985) (0.21028) [-15.7%]  0.005
 chromium (whole) 1.36877 1.38916 0.06031 [+2.6%] 0.0005 (0.12129) (0.12530) (0.01659)  [-0.4%]
       big (400k) 2.39053 2.43636 0.11813 [+1.8%] 0.0005 (0.19600) (0.20396) (0.02644)  [-0.1%]
            total 6.09256 6.19472 0.33431 [-0.2%]        (1.24139) (1.30690) (0.25114)  [-7.5%]  0.005
wincent commented 3 days ago

Quick test of Hoard, for comparison:

brew tap emeryberger/hoard
brew install --HEAD emeryberger/hoard/libhoard
make clean
make
hoard bin/benchmarks/matcher.lua

Results (relative to wincent/mimalloc branch) on M3:

Summary of cpu time and (wall time):

                    best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
     pathological 0.20645 0.21815 0.07995 [-3.6%]  0.025 (0.20715) (0.21876) (0.08035) [-3.5%] 0.025
        command-t 0.16663 0.17294 0.05643 [+0.7%]        (0.16724) (0.17352) (0.05677) [+0.7%]
chromium (subset) 1.34275 1.35172 0.02076 [-0.8%] 0.0005 (0.28418) (0.28908) (0.01675) [-1.2%] 0.005
 chromium (whole) 1.10651 1.11530 0.02674 [-0.1%]        (0.12181) (0.12475) (0.01076) [-0.0%]
       big (400k) 1.66873 1.68029 0.03942 [-0.1%]        (0.18046) (0.18403) (0.01414) [-0.5%]
            total 4.49797 4.53841 0.14236 [-0.4%]   0.05 (0.96567) (0.99015) (0.11602) [-1.1%]  0.05