opensafely-core / research-template-docker

This provides the devcontainer Docker image used by repos created from the OpenSAFELY research template.
Other
0 stars 0 forks source link

R packages don't exactly match those in the OpenSAFELY R action image #22

Closed Jongmassey closed 4 months ago

Jongmassey commented 4 months ago

There are three extra R packages installed in this image compared to what's in the OpenSAFELY R action image: littler, docopt, and spatial

littler and docopt are explicitly added in the Rocker Dockerfile (although not in an easy machine-parseable way) and are system/CLI integration packages which a research user is unlikely to make use of. Spatial is a bit of an oddity as it's in the R recommended packages so don't know why it's not in the OpenSAFELY R action image. It's a package that provides statistical/analytical functions so there's a small risk a user might write an analytical R script that makes use of this then find that it doesn't work via opensafely run .

I'd earlier misdiagnosed discrepancy between what's in the OpenSAFELY R action image and what's in the image we build for codespaces. The aspects of this relevant to testing are in this PR comment, but in summary: Using the R venv config file as an authoritative source of what's installed in the OpenSAFELY R action image was a bad idea. Instead, I propose we use the packages.csv file in the opensafely-r-docker repo instead. This latter file is used during the build process as the source of R packages to be installed into the renv. For packages which are already installed (by virtue of being part of r-base etc.) then they are not added to the renv config.

I think we just leave all three of these discrepant packages in place, and see if any users get tripped up by this. For the purposes of "testing if the R packages have been installed correctly", it also makes it a lot simpler - we can just exclude these three from the comparison. If any more discrepancies arrive in the future, we want the test to fail as it's something we should be aware of.

StevenMaude commented 4 months ago

I was going to add this to the PR thread in #13, but it's probably more appropriate here.

This may have already been covered by the discussion in #17. In any case, some of the packages are installed twice. This is the diff of the OpenSAFELY R Docker image packages.csv with a similar CSV created from R in a codespace:

diff --git a/packages.csv b/packages.csv
index 59c1e63..cf4360a 100644
--- a/packages.csv
+++ b/packages.csv
@@ -68,6 +68,7 @@
 "DHARMa","0.4.6"
 "digest","0.6.25"
 "distributional","0.2.0"
+"docopt","0.7.1"
 "doParallel","1.0.15"
 "dplyr","1.0.2"
 "DT","0.15"
@@ -199,6 +200,7 @@
 "lcmm","2.0.0"
 "lifecycle","1.0.3"
 "listenv","0.8.0"
+"littler","0.3.12"
 "lme4","1.1-23"
 "lmtest","0.9-37"
 "locfit","1.5-9.4"
@@ -424,16 +426,31 @@
 "zip","2.1.1"
 "zoo","1.8-8"
 "base","4.0.5"
+"boot","1.3-27"
+"class","7.3-18"
+"cluster","2.1.1"
+"codetools","0.2-18"
 "compiler","4.0.5"
 "datasets","4.0.5"
+"foreign","0.8-81"
 "graphics","4.0.5"
 "grDevices","4.0.5"
 "grid","4.0.5"
+"KernSmooth","2.23-18"
+"lattice","0.20-41"
+"MASS","7.3-53.1"
+"Matrix","1.3-2"
 "methods","4.0.5"
+"mgcv","1.8-34"
+"nlme","3.1-152"
+"nnet","7.3-15"
 "parallel","4.0.5"
+"rpart","4.1-15"
+"spatial","7.3-13"
 "splines","4.0.5"
 "stats","4.0.5"
 "stats4","4.0.5"
+"survival","3.2-10"
 "tcltk","4.0.5"
 "tools","4.0.5"
 "utils","4.0.5"

Out of these, three are the packages that were already mentioned in #13 as known to be in the Rocker image, but not the OpenSAFELY R image (docopt, littler, spatial).

The rest that correspond to additional lines are installed twice, once by the Rocker image, and once by the OpenSAFELY R Docker image. In most cases — except for lattice and rpart — these are slightly different versions, which I guess could potentially be a problem.

However, I'm guessing that we're saved by the library path configuration:

> .libPaths()
[1] "/usr/local/lib/R/site-library" "/usr/local/lib/R/library"

so that the OpenSAFELY R packages copied over take precedence over the Rocker packages.

But, if we could remove the duplicates, we could also more cleanly compare the installed versions of the packages, instead of just the package names. (Or at least be guaranteed that the installed packages are installed in the correct place.)

lucyb commented 4 months ago

Testing this out in a dev container, I ran the following R code:

> packageVersion("KernSmooth")
[1] ‘2.23.17’
> packageVersion("foreign")
[1] ‘0.8.80’
> packageVersion("lattice")
[1] ‘0.20.41’
> packageVersion("survival")
[1] ‘3.2.3’

These correspond to the versions currently found in the r-docker packages.csv.

Therefore, it seems as though the correct versions of packages are being loaded and unless it causes significant problems with testing we shouldn't do anything to fix this right now. Although, we should continue to remind researchers and co-pilots to use opensafely run and opensafely exec to test their code rather than rely solely on the development environment being correct.