rocker-org / rocker

R configurations for Docker
https://rocker-project.org
GNU General Public License v2.0
1.45k stars 273 forks source link

wiki: best practices for creating dockerfiles #515

Open nick-youngblut opened 1 year ago

nick-youngblut commented 1 year ago

It would be helpful to include docs on how to create dockerfiles building off of rocker/r-base (or similar). For instance:

nick-youngblut commented 1 year ago

As an example, https://github.com/rocker-org/rocker-versioned2/pkgs/container/tidyverse#install-r-packages states:

Please install R packages from source using the install.packages() R function or the install2.r script, and use apt only to install necessary system libraries (e.g. libxml2). Do not use apt install r-cran-* to install R packages.

...but a build that involves installing bioconductor packages from source (& the MANY dependencies required for any bioconductor package) can take >1.5 hours. There must be a better way. For instance, how is rocker/tidyverse built in order to minimize the build time?

Also, it would be helpful if the docs include installBioc.r and not just install2.r

eddelbuettel commented 1 year ago

Briefest possible answer: start with eddelbuettel/r2u which comes in Ubuntu jammy and focal flavours with over 20k CRAN binaries and over 200 BioC binaries. See more at https://eddelbuettel.github.io/r2u/ (and this will eventually be a part of rocker once I get around reorganising this).

Note that it is NOT a direct descendant of rocker/r-base as the latter is Debian based, and nobody has access to all of CRAN premade for Debian whereas I am able to provide it for Ubuntu; see the r2u docs for more.

cboettig commented 1 year ago

Thanks for raising the issue and apologies for the confusion here. Note that there are essentially two separate stacks in rocker that meet different needs, as noted in the README in this repo, and they serve different needs. Dirk summarizes above one of the approaches in what the README calls the un-versioned stack.

The versioned stack, that you have linked in your example, includes those images (r-ver, rstudio, tidyverse, etc) built from sources in rocker-org/versioned2, and the best practices are indeed the ones you cite -- e.g. install R packages with install.packages or install2.r script wrapper. Please note that the versioned stack is using Ubuntu-based images configured with RSPM package manager as the default mirror, along with the appropriate headers, which means that install.packages() will install prebuilt binaries. This is how packages are installed on rocker/tidyverse, You can try building tidyverse Dockerfiles yourself to confirm (or just look at the logs, a bit buried in there but looks like it takes about 113 seconds).

Regarding versioning, note that Rocker versioned stack locks images based on their R version tag. Once an image is no longer the latest version (e.g. rocker/tidyverse:4.2.1 say), packages are locked by using the RSPM frozen snapshot to immediately before the release of the latest version. This allows latest to act as a rolling version always containing the latest version up until the day the R version rolls over, and everything is frozen. This is done by setting the the default CRAN repo, meaning that again users don't have to do anything to install a consistent version. using rocker/tidyverse:4.2.1, or any other previous version, ensures the build will always have identical versions of all packages, and that those packages are all concurrent. Hope that description makes sense. Naturally there are cases where users want to install specific versions of packages, where a tool like renv may be appropriate.

eitsupi commented 1 year ago

Have you seen the Rocker Project website? https://rocker-project.org/use/extending.html Although the content is not sufficiently rich, I believe we were able to describe the basic content at the time of last year's renewal. (By the way, I noticed that the link from this repository was to DockerHub, not the Rocker website, so I updated it.)

@eddelbuettel @cboettig The wiki content in this repository is outdated and I believe much of the content has been ported over to the website. So I think it would be better to make the wiki read-only and direct people to the website. What do you think?

nick-youngblut commented 1 year ago

Thank you all so much for your rapid feedback! I hope that I have not been too annoying with my list of documentation requests. I would be happy to help with PRs, if you'd like (I just need to understand best-practices myself).

eitsupi commented 1 year ago

@nick-youngblut Thanks, PRs for https://github.com/rocker-org/website are very welcome!