Closed evansd closed 10 months ago
Here are my thoughts:
Adding a compiler is the easiest solution.
Small point - remember that upto 15th February this year the r image used a single stage Dockerfile and included gcc
and g++
(otherwise alot of the source R packages would not have built), it's just that AFAIK no-one used them, here's a small proof of that inspecting the legacy
tag
% docker run --platform linux/amd64 --entrypoint /bin/bash ghcr.io/opensafely-core/r:legacy -c "whereis gcc g++"
gcc: /usr/bin/gcc /usr/lib/gcc
g++: /usr/bin/g++
I don't think you should worry about image size. Simon made the image quite alot smaller on 15th Feb, currently it's at 914 MB compressed, and legacy was 1.64 GB compressed, so it would probably remain smaller than legacy was in any case.
A precompilation workflow - this is possible - the subtlety is that you would need to include the user's code as a function in your own correctly configured R package, which you would then install into the r image (the Rcpp code will be compiled during the installation).
This isn't as hard as it sounds. Rcpp is one of the most common dependencies for CRAN packages, how to include Rcpp code in an R package is very well documented, and Rcpp provides a function which gives you the skeleton of such a package Rcpp::Rcpp.package.skeleton()
, for more info see https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-package.pdf. And you already have a potential package to use for this - Will's osutils package which he has not yet sent to CRAN nor had installed in the image.
So you could add the user's code to osutils; then either get osutils onto CRAN or not bother with CRAN and install from GitHub, as renv can use remotes style syntax e.g., I think renv::install("wjchulme/osutils")
should work , or the usual remotes syntax would be remotes::install_github("wjchulme/osutils")
.
It would be pretty easy to make a binary version of this package available to users to run locally on Windows or macOS. If the package is on CRAN they provide binary packages for the current version of R, or if the package is not on CRAN the new-ish r-universe.dev will provide binaries for the current version of R for free (I distribute binaries of TwoSampleMR and other package this way); but you'd need to make your own CRAN-like repo for R 4.0.5; or renv could install it from source, for which Windows user would only need RTools40 installed.
Compared to 2, this would give you more control, as one of your team would review the user's code before it was merged into Will's/your package.
% docker run --platform linux/amd64 ghcr.io/opensafely-core/r:legacy -e "Rcpp::evalCpp('2 + 2')"
[1] 4
Yes, it did used to work. But no one seemed to be using it AFAICT.
However, much of the reduction in image size came from removing the compiler toolchains as part of the new image. Adding them back in will increase the size again, which may be an acceptable trade off.
We've had a request from a user to add a C++ compiler to the R image to allow them to compile a small amount of code which will make their study run significantly faster:
The
Rcpp
package is already included in the Docker image as it is a dependency of quite a few other packages, so this might have suggested that we already support this. However, I think all the compilation happens at package install time (within the "build" image, which does have a compiler) and runtime use ofRcpp
has never been supported.I think our options are:
1. Reject the request
Pros
Cons
2. Add a compiler
Pros
Cons
3. Develop pre-compilation workflow
The idea is that we'd provide (waves hands) some way for users to produce a binary compiled so as to be executable within our R image, and advice on how to load it from within R. Users would then commit their compiled executable to the repo.
Pros
Cons
Rcpp
doesn't, as far as I can tell, support pre-compilation like this.