Closed bloodearnest closed 1 year ago
If helpful have done this in 3 branches using 2 different approaches
(in these I previously used some MRAN/Microsoft snapshot URLs but sadly Microsoft are discontinuing that service at the end of this month, news here)
In each case you can run the master-branch-name.sh
script, i.e, master-branch-01.sh
to build - depending on speed of internet connection branches 01 and 02 run in under 20 minutes because the packages are prebuilt.
Hi Tom.
We have an approach that seems to be working based on using renv to build specific versions of libraries. I'm currently running a test build of a new image against all OpenSAFELY R code (it will take a while...) to make sure it doesn't break anything.
My approach used renv in a similar way to your renv branch, AFAICT, except:
r-base-core=4.0.5
.Like your branch, we're also switching from 18.04 to 20.04 as the underlying series, mainly because 18.04 is nearly EOL. This does mean some of the underlying system libraries have changed slightly, but the R libraries are all the same.
I'll work on getting a PR up, and I'd love your feedback on it!
Once we've switched, we can work through some of the other issues you've called out, once we have a stable base to work from.
We want move away from using a single :latest
version of all our runtime images, and towards explicit versions, e.g. run: r:4.0 ...
or run: r:4.2
. When we do that, we'll potentially be in a position to switch to using pre-built archives, and use more of rocker's tooling to build images.
Sounds good Simon, I'm happy to look.
I assume from that package name being specified at 4.0.5
, that will bump the version of R from 4.0.2 to 4.0.5. In general guess that it's good to be at the end of a patch series. Posit/RStudio only provide end of patch series versions of R (in addition to the current version) in their posit.cloud environment. Or was that a typo?
Another reason it's good to do this, is that although the tidyverse/Posit/RStudio policy is for their packages to work with the last 5 minor releases of R - which usually equates to 5 years - there are more packages on CRAN by other teams starting to require R version 4.1.0 because I think that's when the native pipe was introduced to R (|>
as opposed to magrittr/dplyr pipe %>%
). I ran update.packages(ask = FALSE)
in the container and it only failed to update 1 package - Gmisc - due to that package using the native pipe. So it would be good to have a subsequent tagged version using at least R 4.1.0.
Ok, PR is here!
Sounds good Simon, I'm happy to look.
I assume from that package name being specified at
4.0.5
, that will bump the version of R from 4.0.2 to 4.0.5. In general guess that it's good to be at the end of a patch series. Posit/RStudio only provide end of patch series versions of R (in addition to the current version) in their posit.cloud environment. Or was that a typo?
Yes this is deliberate, to bring us up to date with the latest 4.0 release. This should be backwards compatable ugrade, and didn't seem to cause any issues in testing, and is easy enough to rollback if we need to.
Another reason it's good to do this, is that although the tidyverse/Posit/RStudio policy is for their packages to work with the last 5 minor releases of R - which usually equates to 5 years - there are more packages on CRAN by other teams starting to require R version 4.1.0 because I think that's when the native pipe was introduced to R (
|>
as opposed to magrittr/dplyr pipe%>%
). I ranupdate.packages(ask = FALSE)
in the container and it only failed to update 1 package - Gmisc - due to that package using the native pipe. So it would be good to have a subsequent tagged version using at least R 4.1.0.
Yep.
I'd like to have publish an r:4.2
image, with the same set of libraries, but at their latest versions. Then OpenSAFELY users can opt in to that by using r:4.2
in their project.yaml.
But we'll need to do that as a series of steps. We'd probably try take a different approach, using pre-built CRAN packages rather than building from source.
great thanks indeed Simon
(I have teaching stress on Tuesday, so it might take me until Wednesday to have a look at the PR.)
It would be great to make pre-built binary CRAN packages - to do that you need to make what is called a CRAN-like repository. For my own interest and also because Iain mentioned this a few months ago I wrote a blog post about how to do that for Linux binary packages
https://remlapmot.github.io/post/2022/make-linux-binary-cran-like-repo/
I know of 2 organisations which have publicly available CRAN-like repos with Linux binary packages - the Posit/RStudio package manager
https://packagemanager.posit.co/client/#/repos/2/overview
which make prebuilt binaries available for Bionic, Focal, and Jammy (as well as several other distros - it's incredibly impressive, as there are snapshots as well)
and the other is the R4PI project (which is actually run by one of the Posit/RStudio developers and uses the same technique)
The R4PI GitHub org is here
I think the build scripts for its CRAN-like repo are in this repo:
https://github.com/r4pi/pkg_builder
It's two CRAN-like repos for the PI are available from
https://pkgs.r4pi.org/ https://pkgs.r4pi.org/armv7l/index.html https://pkgs.r4pi.org/aarch64/index.html
We cannot currently rebuild the R image from scratch. We have to add on to the existing R image we have.
This prevents various improvments, and means the R image is a special snowflake compared to our other images