o2r-project / containerit

Package an R workspace and all dependencies as a Docker container
https://o2r.info/containerit/
GNU General Public License v3.0
290 stars 29 forks source link

Package a session with versioned system dependencies #46

Open nuest opened 7 years ago

nuest commented 7 years ago

As becomes clear in the discussion on geospatial libraries in Rocker, the versions of linked external libraries matter.

Can we support packaging explicit version of linked libraries?

> extSoftVersion()
                     zlib                     bzlib                        xz 
                  "1.2.8"      "1.0.6, 6-Sept-2010"              "5.1.0alpha" 
                     PCRE                       ICU                       TRE 
        "8.38 2015-11-23"                        "" "TRE 0.8.0 R_fixes (BSD)" 
                    iconv                  readline 
             "glibc 2.23"                     "6.3"

> library(sf)
Linking to GEOS 3.5.0, GDAL 2.1.2, proj.4 4.9.2
> sf::sf_extSoftVersion()
   GEOS    GDAL  proj.4 
"3.5.1" "2.1.2" "4.9.2" 

This information could be accessed by a funtion <pkgname_extSoftVersion>, see extSoftVersion and (sf_extSoftVersion()](https://github.com/edzer/sfr/blob/5c3dfea395af81bf352b4007d16c6a7d419883c2/R/init.R#L59)

MatthiasHinz commented 7 years ago

This would be very tough, I think. First of all, there is not generic solution we could use. We would probably have to add individual support for every single version.

Via APT, it is possible to install different versions of one package, but not probable that we can make use of that: To find out, which versions of a package could be installed via apt-get install «pkg»=«version», you can use a command of this form: apt-cache show libreadline6-dev | grep Version. The main problem is, that the repositories mostly provide the most recent version of a package only. Even if there were repositories with historic packages, we would still have to match libraries with package names and map between version-tags, which may vary depending on the platform and architecture. If you want to read a bit more, their are plenty of discussion on how to 'downgrade' a package, e.g. https://askubuntu.com/questions/138284/how-to-downgrade-a-package-via-apt-get/138327.

It seems a little bit more doable to install versioned packages from source. This is slow, and there is also not a generic solution here. Neither is there a central source code repository that we could use (to my knowledge), but for most of the libraries, we can find some repository that includes older versions. For instance:

We already have examples for installing proj and gdal from source here. We could use such commands as templates with different version numbers. This will work for some cases , but of course not for all.

# install latest GDAL and PROJ, based on https://github.com/edzer/sfr/blob/master/.travis.yml
WORKDIR /tmp/gdal
RUN wget http://download.osgeo.org/gdal/2.1.0/gdal-2.1.0.tar.gz \
    && tar zxf gdal-2.1.0.tar.gz \
    && cd gdal-2.1.0 \
    && ./configure \
    && make \
    && make install
WORKDIR /tmp/proj
RUN wget http://download.osgeo.org/proj/proj-4.9.3.tar.gz \
    && tar zxvf proj-4.9.3.tar.gz \
    && cd proj-4.9.3 \
    && ./configure \
    && make \
    && make install \
    && ldconfig
RUN rm -r /tmp/gdal /tmp/proj
MatthiasHinz commented 7 years ago

Any comments on this? I don't think that either of the solutions are easy to implement right now, but we could discuss. @nuest @edzer @MarkusKonk

nuest commented 7 years ago

If we can demonstrate that it is doable for "most" or "many" cases, that would already be quite something!

That there is no central repository for this means that the containerit package becomes central and handles the specifics, e.g. generating the install command from a specific GDAL version.

That image building is slow is not an issue for now.

nuest commented 7 years ago

@MatthiasHinz plase put in here a few (!) sentences of the current state of the implementation.

MatthiasHinz commented 7 years ago

I implemented a parameter versioned_libs (TRUE/FALSE) that triggers dockerfile() to match the linked external libraries.

Currently, the package only reads sf::sf_extSoftVersion() creates RUN Instructions for installing PROJ and GDAL from source.

Example:

> df <- dockerfile(expression(library(sf)),versioned_libs = TRUE)
INFO [2017-03-27 12:00:13] Creating an R session with the following arguments:
     R  --silent --vanilla -e "library(sf)" -e "info <- sessionInfo()" -e "save(list = \"info\", file = \"/tmp/Rtmp69G1ZA/rdata-sessioninfo19c73ba050a\")"
> library(sf)
Linking to GEOS 3.5.0, GDAL 2.1.3, proj.4 4.9.3
> info <- sessionInfo()
> save(list = "info", file = "/tmp/Rtmp69G1ZA/rdata-sessioninfo19c73ba050a")
> 
> 
Loading required namespace: sf
WARN [2017-03-27 12:00:15] No explicit for support for the version 3.5.0 of the linked external software GEOS
WARN [2017-03-27 12:00:15] No explicit for support for the version NA of the linked external software lwgeom
INFO [2017-03-27 12:00:15] Trying to determine system requirements for the package(s) 'sf,magrittr,DBI,units,Rcpp,udunits2' from sysreq online DB
INFO [2017-03-27 12:00:16] Created Dockerfile-Object based on expression

Dockerfile:

FROM rocker/r-ver:3.3.3
LABEL maintainer="matthiashinz"
RUN export DEBIAN_FRONTEND=noninteractive; apt-get -y update \
 && apt-get install -y gdal-bin \
    libgeos-dev \
    libudunits2-dev \
    make \
    wget
WORKDIR /tmp/gdal
RUN wget http://download.osgeo.org/gdal/2.1.3/gdal-2.1.3.tar.gz \
 && tar zxf gdal-2.1.3.tar.gz \
 && cd gdal-2.1.3 \
 && ./configure \
 && make \
 && make install \
 && ldconfig \
 && rm -r /tmp/gdal
WORKDIR /tmp/proj
RUN wget http://download.osgeo.org/proj/proj-4.9.3.tar.gz \
 && tar zxf proj-4.9.3.tar.gz \
 && cd proj-4.9.3 \
 && ./configure \
 && make \
 && make install \
 && ldconfig \
 && rm -r /tmp/proj
RUN ["install2.r", "-r 'https://cloud.r-project.org'", "sf", "magrittr", "DBI", "units", "Rcpp", "udunits2"]
CMD ["R"]
MatthiasHinz commented 7 years ago

The information to install versioned libs is based on a config file that you can find here:

https://github.com/MatthiasHinz/containerit/blob/master/inst/containeRit_config.json

The following R script provides some helper methods for creating such a file, for reading/writing and initializing, and getter methods for the information of interest (I want to re-write the first part into a test later): https://github.com/MatthiasHinz/containerit/blob/master/R/containeRit-config.R

The code that makes use of the config and creates the corresponding RUN commands is created here: https://github.com/MatthiasHinz/containerit/blob/master/R/package-installation-methods.R#L82

MatthiasHinz commented 7 years ago

In order to support versioned GEOS and lwgeom, you would only have to create corresponding entries in the json file.

If we want to support basse::extSoftVersion(), the implementation should be very similar.