Closed klmr closed 1 year ago
Thanks! I will wait for comments from other people before carefully looking at this.
I decreased the number of times that installed.packages()
is called in before_nrs.R
. The function is still called once because in my system (Linux) calling find.package()
multiple times is slower than calling install.packages()
once. The time of running before_nrs.R
decreased only from about 0.20 to about 0.18 of a second. Now, you could:
before_nrs.R
or remove it from your pull request.nvimcom
version number at R/nvimcom/DESCRIPTION
.0.2s vs 0.18s is a tiny difference: I don’t think it makes sense to optimise for a few millisecond.
The difference I am observing on my system is an order of magnitude larger: even calling install.packages()
just once can take more than 5 seconds on a cold start (when there’s on OS level filesystem caching) and always takes >4s. Conversely, calling find.package()
repeatedly for every package in getOptions("defaultPackages")
take only a few milliseconds.
So removing install.packages()
entirely would still be beneficial: in the worst case (= fast local storage, few packages installed) it makes the launch imperceptibly slower. But in the best case (= slow storage, many packages installed) it makes it several seconds faster.
Here’s a fairer benchmark which calls both find.package()
and packageDescription()
on all default packages:
getOption('defaultPackages')
# [1] "datasets" "utils" "grDevices" "graphics" "stats" "methods"
bench::mark(
`installed.packages()` = callr::r(\() installed.packages()[c('abind', 'zoo', 'nvimcom'), ]),
`packageDescription()` = callr::r(\() lapply(getOption("defaultPackages"), \(p) if (length(find.package(p, quiet = TRUE)) == 1L) packageDescription(p))),
iterations = 5L,
check = FALSE,
memory = FALSE
)
expression median `itr/sec` total_time 1 installed.packages() 1.48s 0.440 9.09s 2 packageDescription() 517.8ms 1.92 2.61s
And the median installed.packages()
time is very misleading (see the total_time
): when I use Rscript -e
from the command line instead of callr::r()
from inside R, I reproducibly get times > 4s the first time, and lower times afterwards (and I think this is fairly representative for how FS-level caching works):
cmd = file.path(R.home('bin'), 'Rscript')
args = c('-e', shQuote('x <- installed.packages()'))
for (i in 1 : 5) print(system.time(system2(cmd, args))[['elapsed']])
[1] 4.89 [1] 1.834 [1] 1.671 [1] 2.614 [1] 1.793
Indeed, I rebooted my laptop and the time to run before_nrs.R
was 5 seconds.
Now I see why your pull request was a draft. It impacted other parts of Nvim-R code. I implemented your idea of not calling installed.packages()
, making all necessary adjustments in other parts of Nvim-R. Thanks for your contribution, and feel free to optimize the code further.
Thanks! This means I can close this PR, right?
You can close the PR, unless you still want to work on the issue. I will fix the missing quiet=TRUE
and verbose=FALSE
that you noted.
Currently, Nvim-R and ‘nvimcom’ repeatedly query
installed.packages()
to find information about installed packages. Unfortunately,installed.packages()
can be very slow — up to several seconds! — for large package libraries and/or slow secondary storage (e.g. via network). Having multiple calls in separate processes exacerbates this.This PR replaces all occurrences of
installed.packages()
…find.package()
to check for package existence, andpackageDescription()
to query package information.This is measurably faster even when multiple package descriptions need to be read.
Simple benchmark
```r bench::mark( `installed.packages()` = callr::r(\() installed.packages()[c('abind', 'zoo', 'nvimcom'), ]), `packageDescription()` = callr::r(\() lapply(c('abind', 'zoo', 'nvimcom'), packageDescription)), iterations = 5L, check = FALSE, memory = FALSE ) ``` ``` expression median `itr/sec` total_time 1 installed.packages() 1.56s 0.445 8.98s 2 packageDescription() 447.54ms 2.14 2.33s ``` (Note that the actual numbers can vary quite drastically, based e.g. on the caching of the network storage. On some runs, a single invocation of `install.packages()` takes up to 5 seconds on my particular system!)Since
installed.packages()
is called repeatedly in separate subprocesses, the overall effect of this PR should be a noticeable reduction of the startup time on some systems.… unfortunately I was unable to actually test this PR for now, and I am asking for assistance with that: how do I create an installable, self-contained bundle that I can deploy in my local NeoVim such that it will install the bundled ‘nvimcom’ code, rather than taking the one from the current
master
branch release? Is it enough to use something like{ url = "file:///local/path/to/Nvim-R", branch = "fix/faster-installed-check" }
with my NVim package manager?I did try that, but I couldn’t get it running, and I believe this is at least partially caused by #769 (I’m running on the same cluster, with a very similar config, as @idavydov). After waiting for some time I can start the server but now R reports “package ‘nvimcom’ in options("defaultPackages") was not found”, even though
find.package()
finds it subsequently. Furthermore, I can manually remove the ‘nvimcom’ package from my R library, and Nvim-R automatically reinstalls it. However, when incrementing the package version I can see that the installed version of ‘nvimcom’ is not getting updated, it somehow keeps installing an old package version.