Closed ff6347 closed 5 years ago
@fabianmoronzirfas, please try the following:
RUN R -e "install.packages('remotes');
remotes::install_version('curl', version = '4.0');
remotes::install_version('fs', version = '1.3.1');
remotes::install_version('httr', version = '1.4.1');
remotes::install_version('lubridate', version = '1.7.4');
remotes::install_version('raster', version = '2.8-19');
remotes::install_version('Rcpp', version = '1.0.2');
remotes::install_version('rstanarm', version = '2.18.2');
remotes::install_version('sf', version = '0.7-4');
remotes::install_version('sp', version = '1.3-1');"
I used the following R commands to create the main part of the above code:
pkgs <- c('rstanarm', 'sf', 'fs', 'raster', 'sp', 'lubridate', 'httr', 'Rcpp', 'curl')
installed <- installed.packages()
columns <- c("Package", "Version")
versions <- installed[rownames(installed) %in% pkgs, columns]
cat(paste(collapse = "\n", sprintf(
"remotes::install_version('%s', version = '%s');",
versions[, "Package"], versions[, "Version"]
)))
fs
is missing in the list
Interestingly, the package was not installed on my computer. I updated the list (see above). Maybe you should run the code that I provided from within the container to get the version numbers that were actually installed so far (as my packages may not all be in their most current version).
Build with the pinned version passes in 30 minutes. https://github.com/technologiestiftung/flusshygiene-opencpu-base/pull/6/checks?check_run_id=212976872#step:6:11566
@hsonne After changing the install to the versioned install you've suggested the build of the https://github.com/technologiestiftung/flusshygiene-opencpu-fhpredict-api takes pretty long again. Are you sure this the the right way to get the packages by version?
Do we need to provide the build = TRUE
flag?
@fabianmoronzirfas I assume that it takes so long because we do one install_version()
call per package. The function installs all dependencies and maybe installs those dependencies, that occur in more than one package, multiple times. I found another solution that I would like you to test:
RUN R -e "tarballs <- c(
'Rcpp_1.0.2.tar.gz',
'curl_4.0.tar.gz',
'fs_1.3.1.tar.gz',
'httr_1.4.1.tar.gz',
'lubridate_1.7.4.tar.gz',
'raster_3.0-2.tar.gz',
'remotes_2.1.0.tar.gz',
'rstanarm_2.18.2.tar.gz',
'sf_0.7-7.tar.gz',
'sp_1.3-1.tar.gz'
);
urls <- paste0('https://cran.r-project.org/src/contrib/', tarballs);
install.packages(urls, repos = NULL, type = 'source')"
The listed versions are the most current versions that are found on "https://cran.r-project.org/src/contrib/"
. Unfortunately, the files are moved to "https://cran.r-project.org/src/contrib/Archive/<package>"
once there is a newer version of <package>
. So maybe we should always use the most recent archived version. For that case, the installation instructions are:
RUN R -e "paths <- c(
'Rcpp/Rcpp_1.0.1.tar.gz',
'curl/curl_3.3.tar.gz',
'fs/fs_1.3.0.tar.gz',
'httr/httr_1.4.0.tar.gz',
'lubridate/lubridate_1.7.3.tar.gz',
'raster/raster_2.9-23.tar.gz',
'remotes/remotes_2.0.4.tar.gz',
'rstanarm/rstanarm_2.18.1.tar.gz',
'sf/sf_0.7-6.tar.gz',
'sp/sp_1.2-7.tar.gz'
);
urls <- paste0('https://cran.r-project.org/src/contrib/Archive/', paths);
install.packages(urls, repos = NULL, type = 'source')"
For the moment, I prefer to go with the most recent versions of the first solution above, otherwise I have to "downgrade" the version requirements of our packages kwb.dwd and fhpredict...
I think you missunderstood me. We do the install on this image here in the repo which serves as the base.
Then we use this image as the base for the api image.
I do all the install here and when I build the image for the API it seems that the installs are running again. As if the packages where not present.
@hsonne ☝️
Maybe checking the workflow used in the R package containerit is helpful: https://github.com/o2r-project/containerit
@mrustl thanks for the hint. containerit uses rocker images and rocker uses these scripts from litter. It does not look like there is version management in there.
I think rocker is more mature then opencpu in case of docker setup but for the time beeing we need to work with the opencpu image as base.
No version management, but a function for installing specific versions from CRAN similar to @hsonne proposal:
https://o2r.info/containerit/reference/versioned_install_instructions.html
Running the following code...
containerit:::versioned_install_instructions(
pkgs = data.frame(name = "sf", version = "0.7-7")
)
... results in:
[[1]]
An object of class "Run"
Slot "exec":
[1] "Rscript"
Slot "params":
[1] "-e" "versions::install.versions('sf', '0.7-7')"
So, the docker file generated by the containerit package will contain calls to install.versions()
from the versions package to install packages in a certain version. According to the documentation of that function, it can be given all package names and version strings at once so that we could try the following:
RUN R -e "install.packages('versions'); versions::install.versions(
pkgs = c('curl', 'fs', 'httr', 'lubridate', 'raster', 'remotes', 'Rcpp', 'rstanarm', 'sf', 'sp'),
versions = c('4.0', '1.3.1', '1.4.1', '1.7.4', '3.0-2', '2.1.0', '1.0.2', '2.18.2', '0.7-7', '1.3-1')
);"
However, I do not see why this should be different from using remotes::install_version()
...
I would recommend using the CRAN snapshot timemachine (MRAN) maintained by Microsoft:
In total the flusshygiene R packages are KWB`s TOP3 R packages with >100 R package dependencies (see: https://github.com/KWB-R/pkgmeta/issues/3):
package | n_dependencies | n_recursive_dependencies |
---|---|---|
kwb.flusshygiene.app | 12 | 109 |
fhpredict | 13 | 107 |
kwb.flusshygiene | 9 | 100 |
Dependencies are invitations for other people to break your package. -- Josh Ulrich, private communication
http://dirk.eddelbuettel.com/blog/2018/02/28/ http://www.tinyverse.org/
Rocker also uses MRAN so CRAN R package versions are fixed by date (based on the container build date!)
https://github.com/rocker-org/rocker-versioned/blob/master/r-ver/3.6.0/Dockerfile#L113
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
``` (from: https://hub.docker.com/r/rocker/r-ver/dockerfile
Thanks. I'm testing this right now in PR #8
I guess doing a BUILD_DATE=$(TZ="Europe/Berlin date -I)
is not what we want. This creates the date as the current date of the build if the env variable does not exists. I added it as a --build-arg
I guess doing a
BUILD_DATE=$(TZ="Europe/Berlin date -I)
is not what we want. This creates the date as the current date of the build if the env variable does not exists. I added it as a--build-arg
But… The saveguard for non exiting env variable is actually smart. The install fails if but without throwing an error. It just ends without installing the packages-
hm @mrustl @hsonne Any idea why the install is failing?
> install.packages(c("remotes", "rstanarm", "sf", "fs", "raster", "sp", "lubridate", "httr", "Rcpp", "curl"), repo = 'https://mran.microsoft.com/snapshot/' );
Installing packages into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
Warning: unable to access index for repository https://mran.microsoft.com/snapshot/src/contrib:
cannot open URL 'https://mran.microsoft.com/snapshot/src/contrib/PACKAGES'
>
>
Warning message:
packages 'remotes', 'rstanarm', 'sf', 'fs', 'raster', 'sp', 'lubridate', 'httr', 'Rcpp', 'curl' are not available (for R version 3.6.1)
The MRAN url is wrong (missing DATE)!
https://mran.microsoft.com/snapshot/src/contrib/PACKAGES
(https://github.com/technologiestiftung/flusshygiene-opencpu-base/pull/8/checks?check_run_id=228084450#step:6:783)
Instead of e.g.:
https://mran.microsoft.com/snapshot/2019-09-19/src/contrib/PACKAGES
The MRAN url is wrong (missing DATE)!
https://mran.microsoft.com/snapshot/src/contrib/PACKAGES
(https://github.com/technologiestiftung/flusshygiene-opencpu-base/pull/8/checks?check_run_id=228084450#step:6:783)Instead of e.g.:
https://mran.microsoft.com/snapshot/2019-09-19/src/contrib/PACKAGES
Okay. So it's just an issue with adding the date
Yep
Coo. Seems to work now, but @fabianmoronzirfas there no need to "secure" the date:
install.packages(c("remotes", "rstanarm", "sf", "fs", "raster", "sp", "lubridate", "httr", "Rcpp", "curl"), repo = 'https://mran.microsoft.com/snapshot/***' )
https://github.com/technologiestiftung/flusshygiene-opencpu-base/commit/ace1c21010be1a92f88b2a629e1cde239314a5d5/checks#step:6:771
Yeah. I actually hardcoded it for now into the Dockerfile everything else was failing… 😭
I didn't want to secure it. I wanted to pass it in from the outside as a variable so we can easily update it.
The build is now faster again. Takes 3min 40sec here on GH. I'll close this one.
@hsonne As the title says. Currently on every build wee take the latest version. IMHO we should pin these down to a specific version. If you agree can you provide the syntax and the versions or create a PullRequest/ new branch for it?
https://github.com/technologiestiftung/flusshygiene-opencpu-base/blob/09c87948bcaa2a22f4486fe9dbb9038b5689bcac/Dockerfile#L18