jalvesaq / Nvim-R

Vim plugin to work with R
GNU General Public License v2.0
968 stars 126 forks source link

Improve performance of nvimcom launch #772

Closed klmr closed 1 year ago

klmr commented 1 year ago

Currently, Nvim-R and ‘nvimcom’ repeatedly query installed.packages() to find information about installed packages. Unfortunately, installed.packages() can be very slow — up to several seconds! — for large package libraries and/or slow secondary storage (e.g. via network). Having multiple calls in separate processes exacerbates this.

This PR replaces all occurrences of installed.packages()

This is measurably faster even when multiple package descriptions need to be read.

Simple benchmark ```r bench::mark( `installed.packages()` = callr::r(\() installed.packages()[c('abind', 'zoo', 'nvimcom'), ]), `packageDescription()` = callr::r(\() lapply(c('abind', 'zoo', 'nvimcom'), packageDescription)), iterations = 5L, check = FALSE, memory = FALSE ) ``` ``` expression median `itr/sec` total_time 1 installed.packages() 1.56s 0.445 8.98s 2 packageDescription() 447.54ms 2.14 2.33s ``` (Note that the actual numbers can vary quite drastically, based e.g. on the caching of the network storage. On some runs, a single invocation of `install.packages()` takes up to 5 seconds on my particular system!)

Since installed.packages() is called repeatedly in separate subprocesses, the overall effect of this PR should be a noticeable reduction of the startup time on some systems.


… unfortunately I was unable to actually test this PR for now, and I am asking for assistance with that: how do I create an installable, self-contained bundle that I can deploy in my local NeoVim such that it will install the bundled ‘nvimcom’ code, rather than taking the one from the current master branch release? Is it enough to use something like { url = "file:///local/path/to/Nvim-R", branch = "fix/faster-installed-check" } with my NVim package manager?

I did try that, but I couldn’t get it running, and I believe this is at least partially caused by #769 (I’m running on the same cluster, with a very similar config, as @idavydov). After waiting for some time I can start the server but now R reports “package ‘nvimcom’ in options("defaultPackages") was not found”, even though find.package() finds it subsequently. Furthermore, I can manually remove the ‘nvimcom’ package from my R library, and Nvim-R automatically reinstalls it. However, when incrementing the package version I can see that the installed version of ‘nvimcom’ is not getting updated, it somehow keeps installing an old package version.

jalvesaq commented 1 year ago

Thanks! I will wait for comments from other people before carefully looking at this.

jalvesaq commented 1 year ago

I decreased the number of times that installed.packages() is called in before_nrs.R. The function is still called once because in my system (Linux) calling find.package() multiple times is slower than calling install.packages() once. The time of running before_nrs.R decreased only from about 0.20 to about 0.18 of a second. Now, you could:

klmr commented 1 year ago

0.2s vs 0.18s is a tiny difference: I don’t think it makes sense to optimise for a few millisecond.

The difference I am observing on my system is an order of magnitude larger: even calling install.packages() just once can take more than 5 seconds on a cold start (when there’s on OS level filesystem caching) and always takes >4s. Conversely, calling find.package() repeatedly for every package in getOptions("defaultPackages") take only a few milliseconds.

So removing install.packages() entirely would still be beneficial: in the worst case (= fast local storage, few packages installed) it makes the launch imperceptibly slower. But in the best case (= slow storage, many packages installed) it makes it several seconds faster.

klmr commented 1 year ago

Here’s a fairer benchmark which calls both find.package() and packageDescription() on all default packages:

getOption('defaultPackages')
# [1] "datasets"  "utils"     "grDevices" "graphics"  "stats"     "methods"
bench::mark(
  `installed.packages()` = callr::r(\() installed.packages()[c('abind', 'zoo', 'nvimcom'), ]),
  `packageDescription()` = callr::r(\() lapply(getOption("defaultPackages"), \(p) if (length(find.package(p, quiet = TRUE)) == 1L) packageDescription(p))),
  iterations = 5L,
  check = FALSE,
  memory = FALSE
)
  expression             median `itr/sec` total_time
1 installed.packages()    1.48s     0.440      9.09s
2 packageDescription()  517.8ms     1.92       2.61s

And the median installed.packages() time is very misleading (see the total_time): when I use Rscript -e from the command line instead of callr::r() from inside R, I reproducibly get times > 4s the first time, and lower times afterwards (and I think this is fairly representative for how FS-level caching works):

cmd = file.path(R.home('bin'), 'Rscript')
args = c('-e', shQuote('x <- installed.packages()'))

for (i in 1 : 5) print(system.time(system2(cmd, args))[['elapsed']])
[1] 4.89
[1] 1.834
[1] 1.671
[1] 2.614
[1] 1.793
jalvesaq commented 1 year ago

Indeed, I rebooted my laptop and the time to run before_nrs.R was 5 seconds.

jalvesaq commented 1 year ago

Now I see why your pull request was a draft. It impacted other parts of Nvim-R code. I implemented your idea of not calling installed.packages(), making all necessary adjustments in other parts of Nvim-R. Thanks for your contribution, and feel free to optimize the code further.

klmr commented 1 year ago

Thanks! This means I can close this PR, right?

jalvesaq commented 1 year ago

You can close the PR, unless you still want to work on the issue. I will fix the missing quiet=TRUE and verbose=FALSE that you noted.