jsta / r-docker-tutorial

A docker tutorial for reproducible research
http://jsta.github.io/r-docker-tutorial
250 stars 94 forks source link

gapminder link #9

Closed lmguzman closed 8 years ago

lmguzman commented 8 years ago

On the dockerfile lesson the gapminder link https://cran.r-project.org/src/contrib/gapminder_0.1.0.tar.gz is not working for me :(. Gives me a not found

bkatiemills commented 8 years ago

Hmmm - I see gapminder has been bumped to 0.2.0, but does that mean the 0.1.0 link dies? Weird. Anyway, no reason not to update to 0.2.0 for now, but maybe look into a more stable solution - suggestions welcome.

HeidiSeibold commented 8 years ago

How about

R -e "install.packages('gapminder', repos = 'http://cran.us.r-project.org')"

?

lmguzman commented 8 years ago

Yeah, that works!

ttimbers commented 8 years ago

I am not sure what the best solution for installing R packages with a dockerfile is.

Essentially if you use:

RUN wget https://cran.r-project.org/src/contrib/gapminder_0.2.0.tar.gz
RUN R CMD INSTALL gapminder_0.2.0.tar.gz

you can specify versions, but you are HOOPED if the package has dependencies! They will not be installed with this method. If you use the other method shown above:

R -e "install.packages('gapminder', repos = 'http://cran.us.r-project.org')"

you cannot pick the version, it will just automatically grab the most recent version. This really goes against my philosophy for using Docker... But if there is not a better solution, I suggest we go with the R -e "install.packages('gapminder', repos = 'http://cran.us.r-project.org')" option as at least dependencies are installed that way. @cboettig do you know a better solution?

cboettig commented 8 years ago

You can install from the mran snapshots of cran instead, eg: set repo to "https://mran.revolutionanalytics.com/snapshot/2015-10-07" to install from that date.

Also take a look at the 'checkpoint' and 'packrat' packages to lock versions of your library in a more fine-grained way (rather than pinning to a particular date), more like gemfile.lock in Ruby.

HeidiSeibold commented 8 years ago

'packrat' is already installed on the hadleyverse image, so this would be a good option. I also like the snapshot solution. Don't know wich is easier / more valuable for students.

eddelbuettel commented 8 years ago

As I just got pulled in here:

RUN wget https://cran.r-project.org/src/contrib/gapminder_0.2.0.tar.gz
RUN R CMD INSTALL gapminder_0.2.0.tar.gz

That is wrong on two or more counts: a) you want to combine RUN statements and b) we have install.r to actually fetch a package by name (and current version) from CRAN (see below) and c) you generally do NOT want to hardwire a version number as R resolves that for you.

Quick demo:

$ install.r gapminder
trying URL 'https://cran.rstudio.com/src/contrib/gapminder_0.2.0.tar.gz'
Content type 'application/x-gzip' length 243216 bytes (237 KB)
==================================================
downloaded 237 KB

* installing *source* package ‘gapminder’ ...
** package ‘gapminder’ successfully unpacked and MD5 sums checked
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (gapminder)

The downloaded source packages are in
        ‘/tmp/downloaded_packages’
$ 

That idiom is used all over the Rocker project Dockerfiles. We do not aim to snapshot particular dates, versions, vintages, releases, but @gmbecker can advertise his snapshotting solution. As @cboettig mentions above, there are others too.

ttimbers commented 8 years ago

@cboettig I do like the idea of installing from the mran snapshots of cran for a dockerfile. That would cover both version specificity, as well as package dependencies. Great suggestion. @BillMills @HeidiSeibold what do you think? Should we go with that for installing R packages for the dockerfile lesson?

bkatiemills commented 8 years ago

@eddelbuettel I agree that pinning versions is trouble for package management - but it's exactly what I do want when making a docker container - otherwise the same Dockerfile can generate two different containers depending on when you run it which means the Dockerfile is no longer a good document of what's in the container.

The ideal solution IMO resolves dependencies correctly, remains stable in time, and produces a clear document of what is in the container. @ttimbers @cboettig, your date-pinned solution seems to do the first two, but how do you recommend documenting the actual package versions that end up getting installed by this method? There's probably a convenient way to do this that R superheroes such as yourselves know about - I would want that included in the lesson, since I care way way more about what your version numbers are than what mran was doing on this date in history.

eddelbuettel commented 8 years ago

The official 'R on Docker' container (ie rocker:r-base as well as just r-base) pins as well.

There is a time and place for it. It just so happens that it is not the default use for Carl or myself. Our aim is not frozen-in-time configs. If you want to freeze a setup, keep the container.

This may sound flippant but doing otherwise engages a steep uphill battle against both R's distribution model (CRAN == always current) and the Linux distros (ditto).

cboettig commented 8 years ago

@BillMills If you just want a list of the specific versions of everything that is installed, see the R function installed.packages() (from utils).

The checkpoint and packrat packages try to provide a more portable way to share this information, e.g. if you want collaborators in a non-dockerized environment to just replicate your package suite.

bkatiemills commented 8 years ago

@cboettig ok, I'm sold - checkpoint/mran + installed.packages() is a good solution for reproducibility and clear documentation. So, just to be excruciatingly pedantic, the recommendation is:

RUN R -e "install.packages('gapminder', repos = 'https://mran.revolutionanalytics.com/snapshot/2015-10-07')"

or whatever date, in the Dockerfile, with the understanding that installed.packages() inside the container will prevent the need for too much dep spelunking. Looks good to me!

cboettig commented 8 years ago

@BillMills sounds good to me.

Though for any given R script / Rmd file it usually more practical to just recommend users just report the output of a call to sessionInfo() at the end of their script, rather than installed.packages().

As you may know, sessionInfo() will not only name the versions of packages that were actually loaded in that analysis, but also list other relevant information for debugging, such as platform architecture and locale info.

bkatiemills commented 8 years ago

I think we've settled on RUN R -e "install.packages... per 06 and #28.

Nasiru001 commented 6 years ago

I want to use gapminder packages, but it keeps showing me "there is no package called 'gapminder'" please what is the way out?

eddelbuettel commented 6 years ago

There is: https://cloud.r-project.org/web/packages/gapminder/index.html But it has Depends: R (≥ 3.1.0). Is your R older than that?

Nasiru001 commented 6 years ago

Yes am using 3.4.4.

eddelbuettel commented 6 years ago

Well:

R> R.version.string
[1] "R version 3.4.4 (2018-03-15)"
R> install.packages("gapminder")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/gapminder_0.3.0.tar.gz'
Content type 'application/x-gzip' length 2110951 bytes (2.0 MB)
==================================================
downloaded 2.0 MB

* installing *source* package ‘gapminder’ ...
** package ‘gapminder’ successfully unpacked and MD5 sums checked
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** testing if installed package can be loaded
* DONE (gapminder)

The downloaded source packages are in
    ‘/tmp/RtmpvGUPAg/downloaded_packages’
R> 
Nasiru001 commented 6 years ago

Thanks for the tips. On Mar 29, 2018 4:34 PM, "Dirk Eddelbuettel" notifications@github.com wrote:

Well:

R> R.version.string [1] "R version 3.4.4 (2018-03-15)" R> install.packages("gapminder") Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) trying URL 'https://cloud.r-project.org/src/contrib/gapminder_0.3.0.tar.gz' Content type 'application/x-gzip' length 2110951 bytes (2.0 MB)

downloaded 2.0 MB

  • installing source package ‘gapminder’ ... package ‘gapminder’ successfully unpacked and MD5 sums checked R data moving datasets to lazyload DB inst preparing package for lazy loading help installing help indices * copying figures building package indices ** testing if installed package can be loaded
  • DONE (gapminder)

The downloaded source packages are in ‘/tmp/RtmpvGUPAg/downloaded_packages’ R>

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/r-docker-tutorial/issues/9#issuecomment-377275277, or mute the thread https://github.com/notifications/unsubscribe-auth/AkJSivj_FZJ6Bwe2f-uN_cRjvpMuZGl3ks5tjP7_gaJpZM4ILHz1 .