showteeth / ggcoverage

Visualize and annotate genomic coverage with ggplot2
https://showteeth.github.io/ggcoverage
Other
226 stars 19 forks source link

Docker/Apptainer for linux #36

Closed harrymatthews50 closed 2 months ago

harrymatthews50 commented 2 months ago

Has anybody ever been able to make a Docker/Apptainer image (for Linux) with ggcoverage installed inside? I have so far been unable to get all of the dependencies installed and playing nicely for either version of the software.

m-jahn commented 2 months ago

Hi Harry, it should be possible to make a container for ggcoverage. But can you be more specific about your problems installing ggcoverage? It should install smoothly on any recent R version. I have made an effort recently to reduce the long list of dependencies to a more basic set, and flag others as suggests.

harrymatthews50 commented 2 months ago

Hi m-jahn. Yes I have your contributions. It's good that this is being worked on :) I have tried a bunch of things to get the various dependencies to be installed correctly, but the basic problem is ... Beginning from a recent r-base Docker image the dependencies don't install correctly. Below is the apptainer def file

Bootstrap: docker
From: r-base:4.4.1

%post
    export DEBIAN_FRONTEND=noninteractive
    apt-get update 
    # Install R packages
    R -e "install.packages('remotes')"
    R -e "remotes::install_github('showteeth/ggcoverage@v1.3.0',dependencies=TRUE)"

%environment
    # Set environment variables
    export R_LIBS_USER=/usr/local/lib/R/site-library

which I build with

apptainer build ggcoverage.sif ggcoverage.def > ggcoverage_build.log 2>&1

The last part of the log shows that a bunch of dependencies didn't install as expected.

* building ‘ggcoverage_1.3.0.tar.gz’
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
ERROR: dependencies ‘GenomicRanges’, ‘ggbio’, ‘IRanges’, ‘Rsamtools’, ‘rtracklayer’, ‘GenomeInfoDb’, ‘S4Vectors’, ‘Biostrings’, ‘BSgenome’, ‘GenomicAlignments’, ‘ggforce’, ‘HiCBricks’, ‘ggpattern’ are not available for package ‘ggcoverage’
* removing ‘/usr/local/lib/R/site-library/ggcoverage’
There were 44 warnings (use warnings() to see them)

If you can help at all I would really appreciate it. Best, Harry

m-jahn commented 2 months ago

I see. Good for you that this should be easy to solve :-) From the packages that were not installed I can see a pattern, which is, these are all Bioconductor packages. Just try to add Bioconductor as package source (google most efficient way), or install these dependencies a bit differently from CRAN packages in your apptainer/singularity definition. Something along the lines:

# install the bioconductor package manager from CRAN
install.packages("BiocManager")

# install bioconductor packages with this manager
BiocManager::install("Biostrings")
harrymatthews50 commented 2 months ago

Thanks for the help! Using the following .def

From: r-base:4.4.1

%post
    export DEBIAN_FRONTEND=noninteractive
    apt-get update 
    # Install R packages
    R -e "install.packages(c('remotes','BiocManager'))"
    R -e "library('BiocManager'); BiocManager::install(c('GenomicRanges', 'ggbio', 'IRanges', 'Rsamtools', 'rtracklayer', 'GenomeInfoDb', 'S4Vectors', 'Biostrings', 'BSgenome', 'GenomicAlignments', 'ggforce', 'HiCBricks', 'ggpattern'))"
    R -e "remotes::install_github('showteeth/ggcoverage@v1.3.0',dependencies=TRUE)"

%environment
    # Set environment variables
    export R_LIBS_USER=/usr/local/lib/R/site-library

I still see similar issues in the log

Error in mydir.create(name) : 
  failed to create directory ‘BSgenome.Hsapiens.UCSC.hg19/man’
ERROR: dependencies ‘GenomeInfoDb’, ‘GenomicRanges’, ‘SummarizedExperiment’, ‘Biostrings’, ‘Rsamtools’, ‘GenomicAlignments’, ‘GenomicFeatures’, ‘AnnotationDbi’, ‘VariantAnnotation’, ‘ensembldb’, ‘AnnotationFilter’ are not available for package ‘biovizBase’
* removing ‘/usr/local/lib/R/site-library/biovizBase’
ERROR: dependencies ‘biovizBase’, ‘GenomeInfoDb’, ‘GenomicRanges’, ‘SummarizedExperiment’, ‘Biostrings’, ‘Rsamtools’, ‘GenomicAlignments’, ‘BSgenome’, ‘VariantAnnotation’, ‘rtracklayer’, ‘GenomicFeatures’, ‘OrganismDbi’, ‘ensembldb’, ‘AnnotationDbi’, ‘AnnotationFilter’ are not available for package ‘ggbio’
* removing ‘/usr/local/lib/R/site-library/ggbio’

The downloaded source packages are in
    ‘/tmp/RtmpZ7jzly/downloaded_packages’
Running `R CMD build`...
* checking for file ‘/tmp/RtmpZ7jzly/remotesf0de473c786f/showteeth-ggcoverage-d4f7f42/DESCRIPTION’ ... OK
* preparing ‘ggcoverage’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘ggcoverage_1.3.0.tar.gz’
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Error in untar2(tarfile, files, list, exdir, restore_times) : 
  incomplete block on file
There were 38 warnings (use warnings() to see them)
> 
> 
INFO:    Adding environment to container
INFO:    Creating SIF file...
INFO:    Build complete: ggcoverage.sif

The package looks very nice but I don't have days to spend trying to install it and will use something else. If the installation is so delicate I think pushing a Docker image to DockerHub with it pre-installed would be very valuable and increase the utility of the package.

m-jahn commented 2 months ago

sorry to hear that this didn't fix the issue. I would need to try myself to get to the bottom of this, but currently don't have time to do this. I leave the issue open and hopefully get back to this. Just one thought: can you install other Bioconductor packages in your apptainer image, or packages with BioC dependencies? Just wondering if this problem is unique for our package, or if it's a general issue.

harrymatthews50 commented 2 months ago

It is definitely not specific to your package. For example library('BiocManager'); BiocManager::install('GenomicRanges',dependencies=True) also fails inside the container. It looks like some missing C++ libraries e.g libcurl. I just haven't really got the time to figure the whole dependency tree out. Hence if you have a working installation it would be really nice to have snapshot of it as a container. Thanks for your time and help. Harry

m-jahn commented 2 months ago

OK I see. I think you need an image definition that goes beyond a plain vanilla linux + R. Look for example at this page: https://support.bioconductor.org/p/p134101/#9158760

People seem to install various system dependencies for these packages before they even start with R. Makes sense given that many packages require typical libaries like libxml, libgeos, libcgal, ... In their docker definition they have for example:

# Base image https://hub.docker.com/u/rocker/
FROM rocker/shiny:latest

# system libraries of general use
## install debian packages
RUN apt-get update -qq && apt-get -y --no-install-recommends install \
    libxml2-dev \
    libcairo2-dev \
    libsqlite3-dev \
    libmariadbd-dev \
    libpq-dev \
    libssh2-1-dev \
    unixodbc-dev \
    libcurl4-openssl-dev \
    libssl-dev \
    coinor-libcbc-dev coinor-libclp-dev libglpk-dev

Regardless of docker or singularity, installation of your desired R packages will also fail on a vanilla linux system without these dependencies. The error log usually states what is missing, so you can get to the bottom of this. I think this is not specific for the package, hence I can close the issue.

harrymatthews50 commented 2 months ago

Thanks for the tip about the bioconductor base images. That will definitely be helpful if not now in the future :)