opensafely-core / r-docker

Docker image for running R scripts in OpenSAFELY
1 stars 3 forks source link

Decide whether/how to support compiled user code #149

Closed evansd closed 10 months ago

evansd commented 10 months ago

We've had a request from a user to add a C++ compiler to the R image to allow them to compile a small amount of code which will make their study run significantly faster:

I’m using Rcpp (an interface between R and c++) to run code much faster than it would run in R alone. The Rcpp code runs a model of covid transmission for a given set of parameters, and will be called thousands/millions of times by R code that samples the parameters to fit the model to EHRs. The Rcpp code is called via sourceCpp() within R. It is not long (about 100 lines) and compiles almost instantaneously on my PC. ... I believe the processing time required for c++ compilation will be negligible compared to the savings in run time resulting from using Rcpp. It might not even be viable run the same task through R alone.

The Rcpp package is already included in the Docker image as it is a dependency of quite a few other packages, so this might have suggested that we already support this. However, I think all the compilation happens at package install time (within the "build" image, which does have a compiler) and runtime use of Rcpp has never been supported.

I think our options are:

  1. Say "sorry but no" and reject the request.
  2. Add a compiler to the R image.
  3. Develop some workflow for allowing users to pre-compile code and include the build artefact in their repo.

1. Reject the request

Pros

Cons

2. Add a compiler

Pros

Cons

3. Develop pre-compilation workflow

The idea is that we'd provide (waves hands) some way for users to produce a binary compiled so as to be executable within our R image, and advice on how to load it from within R. Users would then commit their compiled executable to the repo.

Pros

Cons

remlapmot commented 10 months ago

Here are my thoughts:

  1. Adding a compiler is the easiest solution.

    Small point - remember that upto 15th February this year the r image used a single stage Dockerfile and included gcc and g++ (otherwise alot of the source R packages would not have built), it's just that AFAIK no-one used them, here's a small proof of that inspecting the legacy tag

    % docker run --platform linux/amd64 --entrypoint /bin/bash ghcr.io/opensafely-core/r:legacy -c "whereis gcc g++"
    
    gcc: /usr/bin/gcc /usr/lib/gcc
    g++: /usr/bin/g++

    I don't think you should worry about image size. Simon made the image quite alot smaller on 15th Feb, currently it's at 914 MB compressed, and legacy was 1.64 GB compressed, so it would probably remain smaller than legacy was in any case.

  2. A precompilation workflow - this is possible - the subtlety is that you would need to include the user's code as a function in your own correctly configured R package, which you would then install into the r image (the Rcpp code will be compiled during the installation).

    This isn't as hard as it sounds. Rcpp is one of the most common dependencies for CRAN packages, how to include Rcpp code in an R package is very well documented, and Rcpp provides a function which gives you the skeleton of such a package Rcpp::Rcpp.package.skeleton(), for more info see https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-package.pdf. And you already have a potential package to use for this - Will's osutils package which he has not yet sent to CRAN nor had installed in the image.

    So you could add the user's code to osutils; then either get osutils onto CRAN or not bother with CRAN and install from GitHub, as renv can use remotes style syntax e.g., I think renv::install("wjchulme/osutils") should work , or the usual remotes syntax would be remotes::install_github("wjchulme/osutils").

    It would be pretty easy to make a binary version of this package available to users to run locally on Windows or macOS. If the package is on CRAN they provide binary packages for the current version of R, or if the package is not on CRAN the new-ish r-universe.dev will provide binaries for the current version of R for free (I distribute binaries of TwoSampleMR and other package this way); but you'd need to make your own CRAN-like repo for R 4.0.5; or renv could install it from source, for which Windows user would only need RTools40 installed.

    Compared to 2, this would give you more control, as one of your team would review the user's code before it was merged into Will's/your package.

remlapmot commented 10 months ago
  1. Just to show that Rcpp was working with those compilers in the legacy image (and I assume images prior to that)
    % docker run --platform linux/amd64 ghcr.io/opensafely-core/r:legacy -e "Rcpp::evalCpp('2 + 2')"
    [1] 4
bloodearnest commented 10 months ago

Yes, it did used to work. But no one seemed to be using it AFAICT.

However, much of the reduction in image size came from removing the compiler toolchains as part of the new image. Adding them back in will increase the size again, which may be an acceptable trade off.