o2r-project / containerit

Package an R workspace and all dependencies as a Docker container
https://o2r.info/containerit/
GNU General Public License v3.0
289 stars 29 forks source link

Allow repo specification in dockerfile() #150

Open leungi opened 4 years ago

leungi commented 4 years ago

Reproducible example below.

Proposal: dockerfile(repo = "https://cloud.r-project.org", ...), and then insert repo into RUN ["install2.r", repo, "randomForest"].

# this fails
Dockerfile:
FROM trestletech/plumber
LABEL maintainer="leungi"
RUN ["install2.r", "randomForest"]
> docker build -t auto-doc .
Sending build context to Docker daemon  288.8kB
Step 1/3 : FROM trestletech/plumber
 ---> f9aa6e6553fb
Step 2/3 : LABEL maintainer="leungi"
 ---> Using cache
 ---> 583999c677af
Step 3/3 : RUN ["install2.r", "randomForest"]
 ---> Running in 175175bb1bfc
Warning: unable to access index for repository https://cloud.r-project.org/src/contrib:
  cannot open URL 'https://cloud.r-project.org/src/contrib/PACKAGES'
Warning message:
package ‘randomForest’ is not available (for R version 3.6.0)
Removing intermediate container 175175bb1bfc
 ---> 46f6dfc54083
Successfully built 46f6dfc54083
Successfully tagged auto-doc:latest
# this works
Dockerfile:
FROM trestletech/plumber
LABEL maintainer="leungi"
RUN ["install2.r",  "-r https://cran.rstudio.com/", "randomForest"]
> docker build -t auto-doc .
Sending build context to Docker daemon  288.8kB
Step 1/10 : FROM trestletech/plumber
 ---> f9aa6e6553fb
Step 2/10 : LABEL maintainer="leungi"
 ---> Using cache
 ---> 583999c677af
Step 3/10 : RUN ["install2.r", "-r https://cran.rstudio.com/", "randomForest"]
 ---> Running in 14e3ee1d6688
trying URL 'https://cran.rstudio.com/src/contrib/randomForest_4.6-14.tar.gz'
Content type 'application/x-gzip' length 80074 bytes (78 KB)
==================================================
downloaded 78 KB

* installing *source* package ‘randomForest’ ...
** package ‘randomForest’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-3.6.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c classTree.c -o classTree.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-3.6.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c init.c -o init.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-3.6.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c regTree.c -o regTree.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-3.6.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c regrf.c -o regrf.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-3.6.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c rf.c -o rf.o
gfortran  -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-3.6.0=. -fstack-protector-strong  -c rfsub.f -o rfsub.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG     -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-3.6.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c rfutils.c -o rfutils.o
gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-z,relro -o randomForest.so classTree.o init.o regTree.o regrf.o rf.o rfsub.o rfutils.o -lgfortran -lm -lquadmath -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/00LOCK-randomForest/00new/randomForest/libs
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (randomForest)

The downloaded source packages are in
        ‘/tmp/downloaded_packages’
Removing intermediate container 14e3ee1d6688
 ---> 7a173cd90611
Step 4/10 : RUN mkdir /data
 ---> Running in fefc0b2c19a0
Removing intermediate container fefc0b2c19a0
 ---> d8b0ec30e93f
Step 5/10 : COPY ["bos_rf_score.R", "/data"]
 ---> d1015547e9e4
Step 6/10 : COPY ["model/bos_rf.rds", "/data"]
 ---> fd5e4b562f1f
Step 7/10 : WORKDIR /data
 ---> Running in bec74f059753
Removing intermediate container bec74f059753
 ---> e09f51b64240
Step 8/10 : EXPOSE 8000
 ---> Running in b2cd7688ee44
Removing intermediate container b2cd7688ee44
 ---> 0a7c39c8e083
Step 9/10 : ENTRYPOINT ["R", "-e", "pr <- plumber::plumb('/data/bos_rf_score.R'); pr$run(host='0.0.0.0', port=8000)"]
 ---> Running in 7480b4b7800b
Removing intermediate container 7480b4b7800b
 ---> 4b508f2fff78
Step 10/10 : CMD ["R"]
 ---> Running in 8ad13e92eaf2
Removing intermediate container 8ad13e92eaf2
 ---> 69b7a47516fa
Successfully built 69b7a47516fa
Successfully tagged auto-doc:latest
leungi commented 4 years ago

Current hacky workaround:

repo <- 'https://cran.rstudio.com/'
pkgs <- paste(setdiff(names(sessionInfo()$otherPkgs), 'containerit'),
                            collapse = ' ')

my_run_pkg <- Run("install2.r", glue::glue('-r {repo} {pkgs}'))

my_dockerfile <- dockerfile(
  from = NULL,
  image = "trestletech/plumber"
)

addInstruction(my_dockerfile) <- list(my_run_pkg)

print(my_dockerfile)

#> FROM trestletech/plumber
#> LABEL maintainer="leungi"
#> RUN ["install2.r", "-r https://cran.rstudio.com/ sp randomForest"]
nuest commented 4 years ago

Hihi @leungi - thank you for the suggestion. Sometimes CRAN mirrors can be a bit out of sync, so in your case the package randomForest should also be available on the cloud.rproject - mirror by now.

Anyway, I agree being able to define the used repo is a useful feature to expose to the user, I've added it to the next tasks.

leungi commented 4 years ago

I just tested again, and confirmed that both CRAN and cloud-r contains the same source code randomForest_4.6-14.tar.gz.

However, I still get the same error message as above when specifying the latter as repo to download libraries.

I checked for firewall issue, but not the case, since I'm able to download via http.

Were you able to replicate?