Closed bloodearnest closed 1 year ago
This is super impressive Simon.
The multi-stage build and the caching is superb.
My comments are:
At the beginning of a build there's a warning
just build
WARN[0000] The "PACKAGE" variable is not set. Defaulting to a blank string.
however, on Git Bash on Windows 11 and on bash
and zsh
my WSL v2 Ubuntu this seems to cause the build to fail and exit. On zsh
on macOS the build succeeded. I can't explain why the difference occurs.
On line 40 of the Dockerfile I recommend installing renv at a specific version in a reproducible way (in case your cache gets deleted), and this is what the renv docs site recommends https://rstudio.github.io/renv/articles/docker.html#creating-docker-images-with-renv You could implement their approach of using the remotes package or install from a Posit Package Manager snapshot (or there are other ways to achieve this), e.g. in my legacy-03 branch I did it with
install.packages('renv', repos = 'https://packagemanager.posit.co/all/__linux__/focal/2023-01-12+BkwhZFgY')
I think the add-package
section of the Dockerfile may need a little bit of work or at least better explanation. This is because as far as I understand it the Michael Rutter PPA does not contain all CRAN packages. I remember reading it had about 6,000 packages (not sure how to check this), whereas all of CRAN is approx 20,000 packages. So I think that I can make the just add-package
error by passing it the name of a package that's on CRAN but not in the PPA (I think - or maybe the below error occurs because CRAN package names can contain capital letters), here's an example using one of my CRAN packages, I tried different syntaxes
just add-package OneSampleMR
just add-package onesamplemr
just add-package r-cran-onesamplemr
which all failed.
The first syntax fails with
./add-package.sh OneSampleMR
Attempting to build and install OneSampleMR
[+] Building 0.0s (0/0)
invalid tag "r-OneSampleMR": repository name must be lowercase
Builing OneSampleMR failed.
You may need to add build dependencies (e.g. -dev packages) to build-dependencies.txt
error: Recipe `add-package` failed on line 22 with exit code 1
and the second and third syntaxes fail with the last three lines of the above message.
Additionally adding another of my packages, mrbayes appeared to work but the tests errored with
just add-package mrbayes
./test.sh "r"
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Error: unexpected symbol in "-e library"
Execution halted
error: Recipe `test` failed on line 27 with exit code 1
error: Recipe `add-package` failed on line 22 with exit code 1
I think this is simply fixed by the syntax for the "
before library
on line 4 of test.sh, i.e., maybe you need to escape the quotes or use a different quote syntax given there are nested quotes on that line.
It's great to see the locale issue fixed.
It turns out that C.UTF-8 is what Posit Cloud set their machines to. This seems to return the same string sort ordering for the 26 letter alphabet as when the locale is set to en_GB.UTF-8 (as it likely is on all/most users machines).
sort(c(head(letters), head(LETTERS)))
## [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" "f" "F"
So I'm pretty sure that most graphs with axes sorted by a string variable will now be the same whether produced in the container or on a local machine (that would be a graph made locally using summary data, obvs).
It's great to see the locale issue fixed.
This came for free by switching to use our standard base-action image, which sets up a bunch of sensible defaults for all our images :+1:
It turns out that C.UTF-8 is what Posit Cloud set their machines to. This seems to return the same string sort ordering for the 26 letter alphabet as when the locale is set to en_GB.UTF-8 (as it likely is on all/most users machines).
sort(c(head(letters), head(LETTERS))) ## [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" "f" "F"
So I'm pretty sure that most graphs with axes sorted by a string variable will now be the same whether produced in the container or on a local machine (that would be a graph made locally using summary data, obvs).
Right, that makes sense, thanks for digging into it.
I think we're ok with from a reproducability PoV if re-running old action with this image if the sort order ends up different.
This is super impressive Simon.
The multi-stage build and the caching is superb.
My comments are:
At the beginning of a build there's a warning
just build WARN[0000] The "PACKAGE" variable is not set. Defaulting to a blank string.
however, on Git Bash on Windows 11 and on
bash
andzsh
my WSL v2 Ubuntu this seems to cause the build to fail and exit. Onzsh
on macOS the build succeeded. I can't explain why the difference occurs.
Gah, seems docker issue the warning even if it's not going to build that stage in the Dockerfile. We can give it a default arg, that should stop that warning. I'll check it out.
- On line 40 of the Dockerfile I recommend installing renv at a specific version in a reproducible way (in case your cache gets deleted), and this is what the renv docs site recommends https://rstudio.github.io/renv/articles/docker.html#creating-docker-images-with-renv
Yes, that would probably be better. It's hard to pin the version of the package manager that understands how to install specific version on a platform that doesn't natively support that :)
You could implement their approach of using the remotes package or install from a Posit Package Manager snapshot
Yeah, I shied away from remotes
, as it enables installing from random github repos, which may be ok, but feels like a policy change that needs further discussion before implementing
(or there are other ways to achieve this), e.g. in my legacy-03 branch I did it with
install.packages('renv', repos = 'https://packagemanager.posit.co/all/__linux__/focal/2023-01-12+BkwhZFgY')
That might be worth doing.
However, given that it's just one package, and the version shouldn't matter to users of this image, I'm inclined to keep it unpinned atm. If a new version of renv
comes out that breaks things, well, the Dockerfile build will break, and we'll need to fix it. We can always pin it then.
- I think the
add-package
section of the Dockerfile may need a little bit of work or at least better explanation.
Yeah, it is not pretty. We could just do it via a docker run
invocation. But that would mean building the package(s) twice, as the cache built in the image via docker run
would not be preserved.
This is because as far as I understand it the Michael Rutter PPA does not contain all CRAN packages. I remember reading it had about 6,000 packages (not sure how to check this), whereas all of CRAN is approx 20,000 packages. So I think that I can make the
just add-package
error by passing it the name of a package that's on CRAN but not in the PPA
So, we're not installing any CRAN packages from the PPA, AFAIK. Those are packaged in apt with the names like r-cran-$NAME
. My understanding was that renv::install("foo@version")
would fetch the sources from upstream CRAN, and build locally, rather than using prebuilt binaries from the PPA.
(I think - or maybe the below error occurs because CRAN package names can contain capital letters), here's an example using one of my CRAN packages, I tried different syntaxes
just add-package OneSampleMR just add-package onesamplemr just add-package r-cran-onesamplemr
which all failed. The first syntax fails with
./add-package.sh OneSampleMR Attempting to build and install OneSampleMR [+] Building 0.0s (0/0) invalid tag "r-OneSampleMR": repository name must be lowercase Builing OneSampleMR failed. You may need to add build dependencies (e.g. -dev packages) to build-dependencies.txt error: Recipe `add-package` failed on line 22 with exit code 1
Bah. Stupid docker. If we lowercase the build image tag, it works, fix incoming.
and the second and third syntaxes fail with the last three lines of the above message.
I expect if failed because it couldn't find the 2nd two names on CRAN. But it would be nice to make that issue error early with a helpful error message.
FTR, just add-package
doesn't support installing prebuilt r-cran
apt pacakges at the moment. The thinking here was a) the original built from source, so we're doing the same in this rebuild and b) version pinning apt packages to specific version not fun, as you well know!
I'd expect in a future version r:4.2
image, we could try installing as much as possible from prebuilt r-cran packages in the ppa. There are various ways to pin/hold apt packages to specific versions. But they're not really designed for this use case (e.g. old versions of the deb will be removed, making rebuilding hard). We may end up always building from source anyway (which we'd have to support for those packages that are not in the ppa anyway). Rather than have 2 ways to install a package, it'd be simpler to have one. And the local build cache means it's at least tenable. If you had to rebuild everything every build, it clearly would not be!
Additionally adding another of my packages, mrbayes appeared to work but the tests errored with
just add-package mrbayes ./test.sh "r" WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested Error: unexpected symbol in "-e library" Execution halted error: Recipe `test` failed on line 27 with exit code 1 error: Recipe `add-package` failed on line 22 with exit code 1
I think this is simply fixed by the syntax for the
"
beforelibrary
on line 4 of test.sh, i.e., maybe you need to escape the quotes or use a different quote syntax given there are nested quotes on that line.
Yep, that sounds right, that script is bit hacky, probably worth doing it properly. Just building mrbayes now to test.
Yep, that sounds right, that script is bit hacky, probably worth doing it properly. Just building mrbayes now to test.
mrbayes worked first time for me.
I suspect this is an issue with not specifying bash properly, perhaps. I've added a commit to attempt to fix that.
however, on Git Bash on Windows 11 and on bash and zsh my WSL v2 Ubuntu this seems to cause > the build to fail and exit. On zsh on macOS the build succeeded. I can't explain why the difference occurs.
Revisiting this, I'm not surprised it fails on git-bash using Docker for Windows. I'm not sure how well the build mount stuff works there. These images are designed to be built and published in a linux VM in Github Actions, and we've never tested building them with Docker for Windows.
I would expect it to work in WSL2, however. When I get chance, I'll reboot and see if there's any easy fix.
thanks Simon
(sorry I can't read Docker error messages properly)
I agree renv is unlikely to break in a future version.
I reran from the new commits and build still fails on both Git Bash and WSL 2 with a similar message about IMAGE_TAG
not being set (I guess Docker on Windows and from WSL 2 is using the same WSL 2 backend, whether there's a Docker setting I can change I'm not sure) - but anyway maybe I should actually use a proper Linux VM to test this in.
$ just build
WARN[0000] The "IMAGE_TAG" variable is not set. Defaulting to a blank string.
docker endpoint for "default" not found
error: Recipe `build` failed with exit code 1
And on my Mac the tests after adding a package still fail because of the quote.
just add-package OneSampleMR
bash ./test.sh "r"
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Error: unexpected symbol in "-e library"
Execution halted
error: Recipe `test` failed on line 28 with exit code 1
error: Recipe `add-package` failed on line 23 with exit code 1
Also you might want to commit into the repo the file renv/.gitignore
with contents
library/
local/
cellar/
lock/
python/
sandbox/
staging/
because that seems to be created whilst adding a package.
I fixed test.sh
on my Mac - I had the wrong reason for the failure - it wasn't the quotes - it was the -e
and \n
syntax that wasn't required. The -e
and \n
were being written into the .tests.R
file which meant that contained invalid R code. Also putting quotes around the package name in a library()
call is optional so I removed those, and semi-colon wasn't needed as the xargs
seems to put each invocation of library()
on a new line. So the version which runs for me is
#!/usr/bin/env bash
set -eu
IMAGE=${1:-ghcr.io/opensafely-core/r}
python3 -c 'import json; print("\n".join(json.load(open("renv.lock"))["Packages"]))' | xargs -I {} echo "library({}, warn.conflicts = FALSE)" > .tests.R
docker run --rm -v "$PWD:/workspace" "$IMAGE" .tests.R
I had good progress on my Windows machine - the build succeeded under WSL 2.
It turned out that as per this Stack Overflow post, something was wrong in my C:\Users\%username%\.docker
directory. Deleting that and restarting Docker was the fix.
And then just add-package ...
ran successfully, both with test.sh
unmodified and also modified as above (so the failure of the unmodified version on Mac is still a bit of a mystery to me).
Two more small comments.
The only other output that was different on my machines was from line 34 of add-package.sh
. I'd say it's safer to run sort
with its -f
flag. On both my WSL and Mac I obtained packages.csv
sorted in a different order. Without the -f
the packages beginning with uppercase letters were sorted first in the list followed by those starting with lowercase letters. With -f
specified I obtained the same order as you. (I checked my WSL locale
shows the same as the container locale
so I can't explain this difference either).
Also I ran a few more add-package
examples and find that
just add-package healthyR.ts
fails because it seems dependency R package rlang requires updating. The output was
... lots of output ...
#0 35.96 Installing clock [0.6.1] ...
#0 150.4 FAILED
#0 150.4 Error installing package 'clock':
#0 150.4 =================================
... lots of output ...
#0 150.4 installing to /renv/renv/staging/1/00LOCK-clock/00new/clock/libs
#0 150.4 ** R
#0 150.4 ** data
#0 150.4 *** moving datasets to lazyload DB
#0 150.4 ** inst
#0 150.4 ** byte-compile and prepare package for lazy loading
#0 150.4 Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
#0 150.4 namespace ‘rlang’ 0.4.10 is being loaded, but >= 1.0.4 is required
#0 150.4 Calls: <Anonymous> ... withCallingHandlers -> loadNamespace -> namespaceImport -> loadNamespace
#0 150.4 Execution halted
#0 150.4 ERROR: lazy loading failed for package ‘clock’
#0 150.4 * removing ‘/renv/renv/staging/1/clock’
#0 150.4 Error: install of package 'clock' failed [error code 1]
#0 150.4 Traceback (most recent calls last):
#0 150.4 11: renv::install("healthyR.ts")
#0 150.4 10: renv_install_impl(records)
#0 150.4 9: renv_install_staged(records)
#0 150.4 8: renv_install_default(records)
#0 150.4 7: handler(package, renv_install_package(record))
#0 150.4 6: renv_install_package(record)
#0 150.4 5: withCallingHandlers(renv_install_package_impl(record), error = function(e) {
#0 150.4 vwritef("\tFAILED")
#0 150.4 writef(e$output)
#0 150.4 })
#0 150.4 4: renv_install_package_impl(record)
#0 150.4 3: r_cmd_install(package, path)
#0 150.4 2: r_exec_error(package, output, "install", status)
#0 150.4 1: stop(error)
#0 150.4 Execution halted
------
failed to solve: executor failed running [/bin/sh -c bash -c "R -e 'renv::activate(); renv::install(\"$PACKAGE\"); renv::snapshot(type=\"all\")'"]: exit code: 1
Builing healthyR.ts failed.
You may need to add build dependencies (e.g. -dev packages) to build-dependencies.txt
I fixed
test.sh
on my Mac - I had the wrong reason for the failure - it wasn't the quotes - it was the-e
and\n
syntax that wasn't required. The-e
and\n
were being written into the.tests.R
file which meant that contained invalid R code. Also putting quotes around the package name in alibrary()
call is optional so I removed those, and semi-colon wasn't needed as thexargs
seems to put each invocation oflibrary()
on a new line. So the version which runs for me is#!/usr/bin/env bash set -eu IMAGE=${1:-ghcr.io/opensafely-core/r} python3 -c 'import json; print("\n".join(json.load(open("renv.lock"))["Packages"]))' | xargs -I {} echo "library({}, warn.conflicts = FALSE)" > .tests.R docker run --rm -v "$PWD:/workspace" "$IMAGE" .tests.R
Nice, have pushed this change, and silenced the IMAGE_TAG warning.
If you have any ideas for some more tests above just importing libraries, I'm all ears :)
Two more small comments.
The only other output that was different on my machines was from line 34 of
add-package.sh
. I'd say it's safer to runsort
with its-f
flag. On both my WSL and Mac I obtainedpackages.csv
sorted in a different order. Without the-f
the packages beginning with uppercase letters were sorted first in the list followed by those starting with lowercase letters. With-f
specified I obtained the same order as you. (I checked my WSLlocale
shows the same as the containerlocale
so I can't explain this difference either).
Huh. That seems like a good idea. The ordering is just really to make the diffs more understandable when reviewing changes, e.g. we can spot an unintended version bump more easily. Will add.
Also I ran a few more
add-package
examples and find thatjust add-package healthyR.ts
fails because it seems dependency R package rlang requires updating. The output was
Yeah, not much we can do about that, as our general policy is no version bumps, to avoid breaking backwards compat.
Thanks for these Simon.
I just noticed that I think you need a little fix when using the PACKAGE@VERSION
syntax, i.e., my guess from the error below is that Docker image tag names can't include an @
symbol. The error was from (I was starting to work backwards through previous versions of the healthyR.ts package)
$ just add-package healthyR.ts@0.2.6
bash ./add-package.sh healthyR.ts@0.2.6
Attempting to build and install healthyR.ts@0.2.6
[+] Building 0.0s (0/0)
invalid tag "r-healthyr.ts@0.2.6": invalid reference format
Builing healthyR.ts@0.2.6 failed.
You may need to add build dependencies (e.g. -dev packages) to build-dependencies.txt
error: Recipe `add-package` failed on line 23 with exit code 1
You could just crop the image tag name from before the @
maybe.
(With that fixed) I would perhaps add a comment to the last section of the README that if a package fails to install at its current version on CRAN then it is recommended to look up the prior version numbers of the package from its CRAN old sources page, which has a URL of the form https://cran.r-project.org/src/contrib/Archive/{PACKAGE}/
, e.g. https://cran.r-project.org/src/contrib/Archive/healthyR.ts/, and attempt to install each prior version in turn using the just add-package PACKAGE@VERSION
syntax.
I fixed
test.sh
on my Mac - I had the wrong reason for the failure - it wasn't the quotes - it was the-e
and\n
syntax that wasn't required. The-e
and\n
were being written into the.tests.R
file which meant that contained invalid R code. Also putting quotes around the package name in alibrary()
call is optional so I removed those, and semi-colon wasn't needed as thexargs
seems to put each invocation oflibrary()
on a new line. So the version which runs for me is#!/usr/bin/env bash set -eu IMAGE=${1:-ghcr.io/opensafely-core/r} python3 -c 'import json; print("\n".join(json.load(open("renv.lock"))["Packages"]))' | xargs -I {} echo "library({}, warn.conflicts = FALSE)" > .tests.R docker run --rm -v "$PWD:/workspace" "$IMAGE" .tests.R
Nice, have pushed this change, and silenced the IMAGE_TAG warning.
If you have any ideas for some more tests above just importing libraries, I'm all ears :)
Sorry I don't. This is a really neat test. Although you can make a case for saying it's not really necessary because during the process of installing a package, R checks whether that package can be successfully loaded. However, given this test is really quick it does no harm so I'd leave it in.
Huh. That seems like a good idea. The ordering is just really to make the diffs more understandable when reviewing changes, e.g. we can spot an unintended version bump more easily. Will add.
thanks - yes I realised that. I did exactly the same at the end of my 3 legacy branches, I just did my sorting in R.
Thanks for these Simon.
I just noticed that I think you need a little fix when using the
PACKAGE@VERSION
syntax, i.e., my guess from the error below is that Docker image tag names can't include an@
symbol. The error was from (I was starting to work backwards through previous versions of the healthyR.ts package)$ just add-package healthyR.ts@0.2.6 bash ./add-package.sh healthyR.ts@0.2.6 Attempting to build and install healthyR.ts@0.2.6 [+] Building 0.0s (0/0) invalid tag "r-healthyr.ts@0.2.6": invalid reference format Builing healthyR.ts@0.2.6 failed. You may need to add build dependencies (e.g. -dev packages) to build-dependencies.txt error: Recipe `add-package` failed on line 23 with exit code 1
You could just crop the image tag name from before the
@
maybe.
Done, thanks
(With that fixed) I would perhaps add a comment to the last section of the README that if a package fails to install at its current version on CRAN then it is recommended to look up the prior version numbers of the package from its CRAN old sources page, which has a URL of the form
https://cran.r-project.org/src/contrib/Archive/{PACKAGE}/
, e.g. https://cran.r-project.org/src/contrib/Archive/healthyR.ts/, and attempt to install each prior version in turn using thejust add-package PACKAGE@VERSION
syntax.
Good idea, added a section to the README on this.
thanks for those commits Simon.
renv/.gitignore
file I mentioned above into the repo, or at least you'll likely need to do so on the first real run of just add-package ...
.just add-package
fails, i.e., to add-package.sh you could add a new line 15 such as
echo "Alternatively, you may need to install an older version of $PACKAGE. Please see the Trouble shooting section of the README."
{PACKAGE}
has been substituted with a package name. As is, GitHub will render this as a hyperlink which may confuse people if they click it because that goes to a 404 without the substitution.(After these I think that's probably me out of comments.)
Also, I had a look at adding butcher. I think you need to go back to version 0.1.5 of that such that its dependencies that are already in the container meet its requirements (this is going back quite a few versions but the version of rlang in this container is quite old).
It's great that this is getting improved! :+1:
With a cold cache, it is very slow to build all the initial packages (i.e. 1-2 hours). But the build process keeps a local cache of them on the host, so rebuilds are fast.
These issues might be worth tracking for possible improvements to this in future:
https://github.com/rstudio/renv/issues/459 https://github.com/rstudio/renv/issues/907 https://github.com/r-lib/pak/issues/343
I just noticed there is an unlucky accidental error in e4f1961 . The problem is that the row with the column names, "Package"
and "Version"
, do not remain at the top of the file, but they are sorted with the rest of the values. Hence they were accidentally moved to line 237 of the file. And after yesterday's PR they are now on line 20. They're better as line 1 of the file or deleted.
Also I checked yesterday's package, forestploter, and
just add-package forestploter
works beautifully in the current build of this image.
It's great that this is getting improved! 👍
With a cold cache, it is very slow to build all the initial packages (i.e. 1-2 hours). But the build process keeps a local cache of them on the host, so rebuilds are fast.
These issues might be worth tracking for possible improvements to this in future:
You're absolutely right, once renv::restore()
can use pak, using parallel downloads will bring an incredible speedup to downloading the package files (it seems renv::install()
can already use pak but that's no help to us).
I just noticed there is an unlucky accidental error in e4f1961 . The problem is that the row with the column names,
"Package"
and"Version"
, do not remain at the top of the file, but they are sorted with the rest of the values. Hence they were accidentally moved to line 237 of the file. And after yesterday's PR they are now on line 20. They're better as line 1 of the file or deleted.
Bah, yes. Ok, so, is there a way for the R that generates the csv data to sort, rather than sorting in bash?
You could either use stringr (stringr::sort()
/stringr::order()
/stringr::rank()
) or data.table (data.table::setorder()
).
In my branches I used stringr but I think you might want to go with data.table. This is because stringr sorts using an en
locale (producing capital letters interspersed within the lowercase), whereas data.table sorts using a C locale which is more similar to what you did originally (capital letters sorted first then lowercase sorted).
Your alternative syntaxes are approximately as follows.
For stringr
library(dplyr)
readr::read_csv("packages.csv",
col_types = "cc",
col_names = c("Package", "Version")
) %>%
filter(Package != "Package") %>%
arrange(stringr::str_rank(Package)) %>%
readr::write_csv("packages-alphabetical.csv", quote = "all")
For data.table
library(data.table)
dat <- read.csv("packages.csv", header = FALSE)
colnames(dat) <- c("Package", "Version")
# Remove accidental inclusion of c("Package", "Version") as a row
dat <- subset(dat, Package != "Package")
dat <- data.table(dat)
# data.table setorder() sorts in C-locale
setorder(dat, Package)
write.csv(dat, file = "packages-alphabetical.csv", row.names = FALSE)
just noticed there is an unlucky accidental error in https://github.com/opensafely-core/r-docker/commit/e4f196150af2964eb1f6048378ecf861b34f61bb . The problem is that the row with the column names, "Package" and "Version", do not remain at the top of the file, but they are sorted with the rest of the values. Hence they were accidentally moved to line 237 of the file. And after yesterday's PR they are now on line 20. They're better as line 1 of the file or deleted.
Fixed in https://github.com/opensafely-core/r-docker/pull/126, will update this PR to use the default sorting once that's landed.
Went with the simplest approach, of just using the same locale for generating the csv.
Right, I think this is ready to go, all outstanding issue dealt with.
Will merge and publish later today if no one shouts!
@bloodearnest: is this something to explain about in Platform News, maybe? Either that or a blog post?
Also, should we be advising users how to clean up the old image, either there or elsewhere, once they have asserted that they definitely don't need the legacy
image? If this new image doesn't share any layers, the old one will be occupying 4.5 GiB of their disk. Docker doesn't typically garbage collect images and containers.
(I don't think this :arrow_up: blocks merging the PR.)
@bloodearnest: is this something to explain about in Platform News, maybe? Either that or a blog post?
Yep, there's a couple of pieces of news blocked on publishing this
Also, should we be advising users how to clean up the old image, either there or elsewhere, once they have asserted that they definitely don't need the
legacy
image? If this new image doesn't share any layers, the old one will be occupying 4.5 GiB of their disk. Docker doesn't typically garbage collect images and containers.
It doesn't share any layers, but fortunately, opensafely pull
will do this for them, as it runs docker image prune
after pulling.
And the plan is that we decide to switch r:latest
to be this new build - the users should notice nothing except for a big download. This is not a semantic change - all the library versions should be identical, and if it goes well, invisible to users.
Of course, many things could go wrong. If we need to roll back, I've already published the current image as r:legacy
, so we can do that easily enough. But you couldn't explicitly run r:legacy
in production atm, as it's not tagged like that on the server, and not planning to do so.
Right, I think this is ready to go, all outstanding issue dealt with.
Will merge and publish later today if no one shouts!
Great (I am travelling today, so I haven't tested the new commits) apart from putting backticks around the URL at the end of the README (to stop people accidentally clicking it and thinking it's invalid) this looks great to me.
It has been great to learn about caching - thanks Simon
it runs
docker image prune
after pulling.
TIL; great!
This change fixes the longstanding issue of not being able to rebuild our R image, which was causing many issues.
It solves the problem of reproducing the R library versions as in the current R docker image by using renv to manage a lockfile of versions.
This is bit nuanced (renv is not really designed for this use case) but it does work. Check the Dockerfile comments for details. With a cold cache, it is very slow to build all the initial packages (i.e. 1-2 hours). But the build process keeps a local cache of them on the host, so rebuilds are fast.
Whilst there are no R library version changes, there are some system library changes, as we moved from 18.04 to 20.04 as a base image. This was necessary because a) 18.04 is EOL shortly and b) we don't maintain a base 18.04 image. The file
20.04-library-changes.txt
contains a summary of differences, basically a bunch of uprades to system libraries. Also, we upgraded to the latest point release of R 4.0 series, 4.0.2 -> 4.0.5.Additionally, many unused components have been removed from the image:
These made the original image heavier, and to the best of our knowledge, have never been used by any projects. If this turns out to be mistaken, we can add things back easily enough. The old image was 4.5G from dockers
Testing:
An R image build with this process was used to test ~30 OpenSAFELY project.yamls with R code in, and it all worked, so we have some confidence we've not broken anything.
We'll keep the old image unter r:legacy tag or something, so we can easily access it if needed.
Note: this does not add a publishing workflow to the repo, and thus does not change the current image by landing.
Whether we can build it in GH remains to be seen. With no cache, I'd expect it to take multiple hours.