rocker-org / rocker-versioned2

Run current & prior versions of R using docker. rocker/r-ver, rocker/rstudio, rocker/shiny, rocker/tidyverse, and so on.
https://rocker-project.org
GNU General Public License v2.0
390 stars 163 forks source link

Warnings on installing bioconductor packages with install2.r #775

Closed nick-youngblut closed 3 months ago

nick-youngblut commented 3 months ago

It could be helpful to include a warning in the install2.r docs about trying to install bioconductor packages with install2.r. Even when one uses install2.r --error --repos "https://bioconductor.org/packages/release/bioc", install2.r can fail due to not installing dependencies correctly, so it seems that the user should default to the following instead of using install2.r:

RUN R -e "install.packages('BiocManager', repos = 'http://cran.rstudio.com/')" \
  && R -e "BiocManager::install( ... )"
eddelbuettel commented 3 months ago

"It's complicated." That is why my money is on r2u and its ~ 400 BioC binaries.

But r-versioned2 has of course its many fans for the time-based install many like. That may make BioC harder, I honestly do not know. I find BioC installs borderline difficult on a normal system given the very specific release and upgrade schedules. It is what it is.

I have some ideas (but no time ...) for an apt installation system at timepoint. In another life, maybe ...

cboettig commented 3 months ago

@nick-youngblut by

install2.r can fail due to not installing dependencies correctly,

are you referring to system dependencies? of course that's not specific to bioc packages. Or you mean getting the wrong versions of bioc? install2.r is just the littler bindings to install.packages(). We all know there are loads of reasons not to like install.packages() -- i.e. bioc team of course had a reason to do package management differently in a way that doesn't always play nicely with how install.packages() is designed, and there are plenty of alternative packages all offering different flavors for handling installs that different users prefer (e.g. pak). Users can pick any of these.

rocker-versioned2 doesn't in any way encourage users to try to use install2.r / install.packages() to install BioC packages (e.g. geospatial installs from BioC this way: https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_geospatial.sh#L79) , but as you know BioC is written in a way in which install.packages() (and thus install2.r) sometimes works and sometimes doesn't. Not to speak for Dirk who maintains the littler scripts including install2.r, but I don't think it's helpful to think of it as a one-stop-shop for installing everything, but instead to just think of it as a convenience wrapper around install.packages(), which has very familiar behavior and limitations. install.packages() doesn't try to "detect" that the repo argument happens to be bioconductor and do something different than it's standard behavior either.

eddelbuettel commented 3 months ago

Indeed. littler long had installBioc.r to dispatch to the BioC install tool. It does go to BioC and does not know about versioning as r-ver2 does here.

nick-youngblut commented 3 months ago

are you referring to system dependencies

It appears that is the main reason why my attempts to use install2.r for bioconductor packages has failed.

rocker-versioned2 doesn't in any way encourage users to try to use install2.r / install.packages() to install BioC packages

I'm just saying that it could be helpful to explicitly point this out in the docs and maybe provide users with a good alternative, such as:

RUN R -e "install.packages('BiocManager', repos = 'http://cran.rstudio.com/')" \
  && R -e "BiocManager::install( ... )"
eddelbuettel commented 3 months ago

You still haven't given a reproducible example: which BioC package? Which system dependencies.

As for your alternative, that is what installBioc.r does (once you give it the needed BiocManager).

nick-youngblut commented 3 months ago

You still haven't given a reproducible example:

Sorry, but I didn't see any direct ask for a reprex. This issue is more about general docs than a direct issue that needs fixing via reproducing the issue and then updating the codebase. Here's the reprex:

# Use a Rocker image as the base
FROM rocker/rstudio:4.3.1

# Install system dependencies
RUN apt-get update && apt-get install -y \
    libhdf5-dev \
    libcurl4-openssl-dev \
    libssl-dev \
    libpng-dev \
    libboost-all-dev \
    libxml2-dev \
    openjdk-8-jdk \
    python3-dev \
    python3-pip \
    wget \
    git \
    libfftw3-dev \
    libgsl-dev \
    pkg-config \
    libgeos-dev \
    libglpk-dev \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/ \
    && rm -rf /tmp/downloaded_packages/ /tmp/*.rds

# Install python dependencies
ENV LLVM_CONFIG=/usr/lib/llvm-10/bin/llvm-config
RUN pip3 install llvmlite numpy umap-learn

# Install FIt-SNE
RUN git clone --branch v1.2.1 https://github.com/KlugerLab/FIt-SNE.git \
  && g++ -std=c++11 -O3 FIt-SNE/src/sptree.cpp FIt-SNE/src/tsne.cpp FIt-SNE/src/nbodyfft.cpp -o bin/fast_tsne -pthread -lfftw3 -lm

# Install R packages from CRAN
RUN install2.r --error --ncpus 8 --repos http://cran.rstudio.com/ \
  dplyr tidyr ggplot2 glue stringr remotes R.utils ape \
  VGAM metap Rfast2 enrichR mixtools \
  spatstat.explore spatstat.geom \
  && rm -rf /tmp/downloaded_packages /tmp/*.rds

# Install R packages from bioconductor
RUN install2.r --error --ncpus 8 --repos https://bioconductor.org/packages/release/bioc \
  multtest S4Vectors SummarizedExperiment \
  SingleCellExperiment MAST DESeq2 BiocGenerics GenomicRanges IRanges \
  rtracklayer monocle Biobase limma glmGamPoi \
  && rm -rf /tmp/downloaded_packages /tmp/*.rds

# Install R packages not in a standard repository
RUN R --no-echo --no-restore --no-save -e \
  "remotes::install_github('mojaveazure/seurat-disk'); remotes::install_github('dmcable/spacexr')"

# expose the port
EXPOSE 8787

# Set the default command to run when a container starts
CMD ["/init"]

As for your alternative, that is what installBioc.r does (once you give it the needed BiocManager).

Maybe that could be added to the install2.r docs? More generally, I can't find any mention of of installBioc.r in the docs at https://rocker-project.org/.

eddelbuettel commented 3 months ago

I can only re-iterate that r2u would likely help you a lot. My day job involves a lot of CI around single-cell with Seurat and other packages, and we are covered by it just nicely in GitHub Actions and other use cases.

For illustrations, I just fired up rocker/r2u and (just to facilitate) did a quick apt update -qqq; apt upgrade -y. I would have then called install.r which dispatches to apt via a lovely R package called bspm but to be even more explicit I did this from R. I fired up (indented here for readability):

> system.time(install.packages(c("multtest", "S4Vectors", "SummarizedExperiment", 
              "SingleCellExperiment",  "MAST", "DESeq2", "BiocGenerics", "GenomicRanges",
              "IRanges", "rtracklayer", "monocle", "Biobase", "limma", "glmGamPoi")))  

It installed 118 binary packages fully resolving all dependencies. In 26 seconds.

.... earlier lines omitted...
Setting up r-bioc-glmgampoi (1.14.3-1.ca2204.1) ...
Setting up r-bioc-mast (1.28.0-1.ca2204.1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.6) ...
   user  system elapsed 
  8.691   4.219  26.618 
> 

I know people have strong feelings about r-ver and reproducibility and all that is cool. I mostly care about fast, easy and reliable current code and r2u helps me (and quite a few others). BioConductor has a not fully released / announced bioc2u on top of it too. It has its charms. Again, if you must install a point-in-time snapshot life is harder but allow me to point out that all that is under your choice.

PS And in case that was not clear, the primary goal of r2u is CRAN: It gives you all of CRAN, plus extras as we don't purge deleted packages should you need them and we add ~ 400 from BioC covering the top 200+ most used ones and their dependencies. And it all comes through apt so it is, as we like to say "Fast. Easy. Reliable. Pick all three." as you get all system dependencies resolved and delivered.

nick-youngblut commented 3 months ago

Thanks @eddelbuettel for the suggestions to use r2u! That should help reduce the build times greatly.

I've so far overlooked r2u since it is not very prominent in the docs at https://rocker-project.org/, and it's not used in many existing bioinformatics dockerfiles that use rocker images (e.g., https://github.com/satijalab/seurat-docker/blob/master/latest/Dockerfile). Maybe it will catch on more once more bioinformaticians realize how useful it can be (e.g., rocker and versioned2 have ~2k stars in total, while r2u has ~180; I'll be sure to add 1 star).

eddelbuettel commented 3 months ago

@nick-youngblut Yes by all means give it a look and try and let me know (at its repo) how it goes. Word of mouth spread in non-linear ways, I mentioned it on the BioC slack and some BioC users are early adopters.

I try to spread the word as I can, but it clearly only goes so and so far. Feel free to amplify if you like it and find it useful.

(And we've been at Rocker for a decade and it does more, r2u is coming up on its second birthday.)

nick-youngblut commented 3 months ago

@eddelbuettel is there an example of using r2u with rocker/rstudio as the base image? I'm trying to update my dockerfile (see above) with r2u, but am running into issues. The r2u setup (5 steps, if one includes all optional steps) is not trivial.

eitsupi commented 3 months ago

I think the topic is off-topic... Can I close this issue or forward it to another repository?

nick-youngblut commented 3 months ago

Can I close this issue or forward it to another repository?

Sorry for getting off-topic.

I think it could still be helpful to provide more information in the install2.r docs on installing bioconductor packages (e.g., via installBioc.R).

eddelbuettel commented 3 months ago

I closed it. This thread is a prime example of us providing an excellent base layer. We are not able to foresee each and every use case for which this is then deployed, so we cannot document each and every facet. We do our best, and act with editorial oversight. Suggestions are always welcome even if we may not take all.

install2.r has come up before, issues can be searched.

r2u questions belong to its repo. (And I'd argue 'it is trivial' -- we shipped 15+ million packages and all the Docker and CI setups are automated. Using it with RStudio should be possible (I install RStudio as an Ubuntu .deb) and a contributed Dockerfile may be welcome. It has not been a use case of mine, but I think one or more user may have asked for it. See the r2u issues at its site.