r-devel / r-project-sprint-2023

Material for the R project sprint
https://contributor.r-project.org/r-project-sprint-2023/
17 stars 3 forks source link

Caching the set of packages installed in a package library #78

Open hturner opened 1 year ago

hturner commented 1 year ago

As described in Uwe's talk in the kick-off session on Day 1: create a database for each library of installed R packages.

This can help to speed up functions that check which packages are installed.

gmbecker commented 1 year ago

We had a very good discussion with Uwe. Uwe and I are going to collaborate on implementing this feature. I will keep the issue updated. @bnaras can you put the notes you took during the meeting in a comment here?

bnaras commented 1 year ago

Problem

installed.packages() takes a long time to execute the first time in a session when a large number of packages are installed in a library. (The subsequent invocations are fast because of caching.)

Impact

The issue is acute in settings where library is shared via network mounted drives, as is not uncommon for educational labs etc. In Windows installations, even with < 100 packages, the function takes 2 seconds or more on a (reasonably powerful) machine as Uwe demonstrated. This is also a problem for an Rstudio user because, upon startup, Rstudio seeks to ascertain all installed packages making it unusable in a networked shared library setting.

Core Issue

The time it takes for installed.packages() is dominated by the time to read every DESCRIPTION file in all the installed packages.

Proposal

Maintain an up-to-date database---we use the term loosely, for now---of installed packages so that the information is readily available for installed.packages() to epxploit.

Desiderata are:

Initial Approach