rocker-org / rocker

R configurations for Docker
https://rocker-project.org
GNU General Public License v2.0
1.45k stars 273 forks source link

R-base missing couple OS libraries #467

Closed lgorenstein closed 2 years ago

lgorenstein commented 2 years ago

Dear maintainers,

First of all, thank you for a great work you are doing! I am reporting a small deficiency as I stumbled upon couple missing libraries in the r-base:4.1.1 image that got in the way of installing some R packages.

Our HPC cluster does not allow users to run Docker, but we embrace Singularity - so a typical workflow is to convert a Docker source into a Singularity image, and then run that image:

singularity pull docker://r-base:4.1.1 r-base:4.1.1.sif
singularity run r-base:4.1.1.sif R

This worked wonderfully, but I hit a snag with subsequent installation of a handful of bioinformatics packages. Most installed fine, but a handful (GO.db, genefilter and MetaboDiff) failed to install due to few missing libraries and headers required to build some source-based dependencies. Most important misses were curl- and openssl-related (to build RCurl), libxml2 (for XML), as well as cairo and libXt for Cairo.

Re-building the container image by adding small handful of Debian packages into it have solved the problem. Here's the list of OS packages I ended up needing to add:

libcurl4-openssl-dev  libssl-dev  libxml2-dev  libcairo2-dev  libxt-dev

It would be great if these OS packages could be added directly into future images. They seem fairly small and sufficiently generic to justify such an addition.

eddelbuettel commented 2 years ago

Hi @lgorenstein

Thanks for taking the time to write back. This is a known issue: r-base is actually meant to be, well, a "base" and does not promise to maximise the number of packages that immediately and without extra commands compile.

You may see that I for example have a number of extra containers building on this. You may also recognise that a number of add-on binary package install directly via apt and do not need the -dev packages. So unconditionally installing =dev package is not correct.

However, it may well be a good idea to create a, say, 'r-bioinfo-devcontainer that derives offr-base` and adds those five. May you could even look after this container?

With 18k CRAN packages and 50k Debian we are often asked to created additonal variants. In general, we cannot as we too are volunteer and busy with the current load we are having which we (sadly) cannot generally extend. I hope you understamd.

lgorenstein commented 2 years ago

Hi Dirk, I understand and definitely get the idea to keep the image lean and small. I also get the "you may have to do some extra commands" approach. However, source build prerequisite (especially such popular as libcurl/libopenssl/lbxml2) seem to me like worthy candidates for inclusion in base. And I don't feel they'd justify spinning off another container branch like r-bioinfo-dev (because really, it's just five more, and they are not even truly bioinformatical).

I also want to highlight another important piece: for many users (cue large HPC centers) modifying an existing docker container is really hard even if they wanted to do those extra commands or add other packages from CRAN or Debian. Docker on large clusters is very often prohibited for security reasons. Unprivileged container runtimes (like Singularity) can take Docker images to bootstrap - and are getting a lot of popularity because of this. But here's a catch: singularity pull does not require root privileges (to pull down an essentially immutable image). But a singularity build does (if I wanted to modify it)! In other words, regular users can not bootstrap from r-base and add extra -dev packages even if they wanted to. All they can do is use existing build tools inside the container.

I am an HPC center staff, so I got root - and in the above example I did type those extra commands (made a Singularity definition file with apt-get those packages) and ran sudo singularity build. And I got a perfect Singularity container that I passed to researchers and they are happy with it. But for them making the same container would have been a lot more difficult because without root they'd have to rely only on what's inside r-base. Hence my thought to have as many build essentials as possible so they could then easily use this container to install source packages in $R_LIBS_USER.

I am not a bioinformatician (I just work with some), but I would argue that while having an r-bioinfo[-dev] is indeed a great idea, it would likely need to have a lot more than those five additional general -dev libraries. And such projects do exist in the bioinfo community already - packaged full of specialized packages and the likes. But my point was not about them, but rather about making r-base more build-friendly in general (not just for bioinformatics).

eddelbuettel commented 2 years ago

It is this simple: r-base is a run-time base container. That keeps the size down. I can then do a quick install of a (ideally, binary) package.

If you want a dev container use something like r-devel. Which comes in at 2.9gb. Such is life.

I am sorry that we can not satisfy all demands all the time. We have to make some choices. I hope you will understand.