opensafely-core / r-docker

Docker image for running R scripts in OpenSAFELY
1 stars 3 forks source link

Suggestion: Install binary R packages from the public RSPM repo #75

Open remlapmot opened 2 years ago

remlapmot commented 2 years ago

In case helpful - apologies if you've already thought about or are already doing this.

The following is what alot of the GitHub Actions for R packages are using to install binary (instead of source) packages on Linux. You can see what I describe below being set in r-lib/actions here.

Binary versions of R packages for Linux distros are available using the public RStudio Package Manager (RSPM) repo https://packagemanager.rstudio.com/all/__linux__/bionic/latest URL (or the URL with the Ubuntu Codename and/or date you prefer) as your CRAN repo URL.

The general details are here.

The trick r-lib/actions uses, and that anyone can use, is that you don't have to use the RStudio Package Manager to use this public repo. So the easiest way to use this is to call install.packages()/renv::install() with the repos argument, e.g.

install.packages("ggplot2", repos = "https://packagemanager.rstudio.com/all/__linux__/bionic/latest")
# or: renv::install() with same arguments

(There are alternative ways to change options()$repos["CRAN"], e.g. using an Rprofile.site file.)

System prerequisites

With RSPM binary packages there are occasionally some additional Linux system requirements required (I guess for when the package actually runs and perhaps for when they load). RStudio provide a shell script for each package on their webpage for each package, e.g. for magick

2022-03-14 15_42_00-RStudio Package Manager

Result

Then the Dockerfile could read in the list of packages in packages.txt and would run substantially faster. It should be practical to build the Dockerfile for every new package added.

And this should make it easier to update the version of R and the versions of the packages included because the packages would update everytime the container was built unless you use a URL with a fixed date.

bloodearnest commented 2 years ago

This is interesting.

The current R image is in a very non-ideal, partially broken state.

We really want to figure out how to build a fresh image, using specific pinned versions of packages, whether from source or binary. We've tried with conda, but that only working with R 3.6, not 4.0. We've tried and failed with renv. Using CRAN repos on ubuntu like how rocker does it leads to having different versions of packages installed.

This is increasingly becoming an issue, so I hope we will prioritise it soon. Any further ideas very welcome.

remlapmot commented 2 years ago

I would say the easiest way to achieve this is to use the RSPM public repo URL with a specified date in.

Obtain the URL from the RSPM website here by choosing the date you want; select Binary and change to the distro you need.

image

Then use the URL in install.packages()

install.packages("ggplot2", repos = "https://packagemanager.rstudio.com/all/__linux__/bionic/2022-03-25+Y3JhbiwyOjQ1MjYyMTU7MUE2QTIzMzc")

Or set as the "CRAN" repos in your Rprofile.site file - because Dockerfile commands run as sudo I think, i.e., contents of Rprofile.site looks like the following (this edited from options() helpfile in R, see repos entry)

local({
  r <- getOption("repos")
  r["CRAN"] <- "https://packagemanager.rstudio.com/all/__linux__/bionic/2022-03-25+Y3JhbiwyOjQ1MjYyMTU7MUE2QTIzMzc"
  options(repos = r)
})

(If setting in the Rprofile.site file you can call install.packages("ggplot2") without the repos argument.)

remlapmot commented 2 years ago

It might be worth adding that using a snapshot of CRAN on a particular date is the approach Microsoft take (they started doing it long ago).

See their CRAN timemachine and accompanying checkpoint package

It is possible to use one of their snapshots as your CRAN repo by again using the repos argument, their URLS are of the form below (i.e., no need to use their checkpoint package)

install.packages("ggplot2", repos = "https://mran.microsoft.com/snapshot/2022-03-25")

I would say that this is less useful to you because it does not distribute byte compiled packages for Linux (as CRAN does not).

bloodearnest commented 2 years ago

This might work from stability point, but it's a bit tricky wrt versioning specific packages. The trade off might be worth it, however.

remlapmot commented 2 years ago

It might be worth noting that there's a new way to obtain binary R packages on Ubuntu.

https://eddelbuettel.github.io/r2u/

It seems to use the relevant RSPM repo within apt.

It works for focal and jammy (rather than bionic). And it's for the latest version of R (4.2.0, and I guess it will move with the latest R version, rather than 4.0.2).