rocker-org / rocker-versioned

Run current & prior versions of R using docker
https://hub.docker.com/r/rocker/r-ver
GNU General Public License v2.0
297 stars 169 forks source link

Question: r-ver has two options(repos) calls? #116

Closed krlmlr closed 2 years ago

krlmlr commented 5 years ago

The r-ver images have two calls that set options(repos) in Rprofile.site, the first seems redundant. See 13eddaa22f11bce8d43390b90b9f6729a11601ec, I can open a PR if desired.

Would you be supportive of moving the build logic to a separate script, so that the Dockerfile looks like this:

FROM debian:stretch

LABEL org.label-schema.license="GPL-2.0" \
      org.label-schema.vcs-url="https://github.com/rocker-org/rocker-versioned" \
      org.label-schema.vendor="Rocker Project" \
      maintainer="Carl Boettiger <cboettig@ropensci.org>"

ARG R_VERSION
ARG BUILD_DATE
ENV R_VERSION=${R_VERSION:-3.5.2} \
    LC_ALL=en_US.UTF-8 \
    LANG=en_US.UTF-8 \
    TERM=xterm

COPY build.sh /
RUN bash /build.sh && rm /build.sh

CMD ["R"]

I'm aware that this adds one layer to some images, but the code would be much easier to read. But perhaps you already have code that generates the Dockerfiles?

cboettig commented 5 years ago

Thanks, right, these are redundant. As you guessed, the second one, setting MRAN to the default instead of CRAN rstudio mirror, was historically added by script to each Dockerfile except latest; but now even latest includes these, and it would make sense to drop the first one.

I have debated about moving the installation to a separate script. The use of COPY is discouraged by Docker's Best Practices though in cases like these not so much for introducing a layer, (which isn't such a limitation these days as it was when Rocker started) but because it prevents caching of the layers.

Also, with the migration/integration of DockerHub and DockerCloud now, I'm planning to move the versioned stack from this current structure that has all versions together to a tag-based model that ought to play more nicely with the automated builds (though I haven't gotten the regexes for the tags on Docker Hub to actually work). See https://github.com/rocker-org/rocker-versioned/tree/tag-as-ver. This does make it harder (but not impossible) to change the Dockerfiles of older images; but now that this stack is mostly quite stable that's possibly not a bad thing. I think this approach creates a cleaner repo overall, and an easier model for others writing extensions to follow without having to maintain simultaneous separate Dockerfiles for each version. Feedback on this welcome before I pull the trigger -- https://github.com/rocker-org/geospatial has already switched over to this model. Also mentioning this since it would make the PR a bit of a moot point.

krlmlr commented 5 years ago

Thanks for the detailed feedback.

In my experiments with a recent-ish docker, the COPY operations didn't disrupt caching: a COPY operation was redone iff the underlying file(s) changed. This seems consistent with the best practices. Am I missing anything else about the caching?

By tag-based model you mean that the code for older versions is organized in Git tags instead of subdirectories? In what cases would you want to change the definition of an old version?

The following development model worked well for me, maybe we could try adopting it for the devel version of geospatial first?

cboettig commented 5 years ago

@krlmlr Thanks for the follow-up, this is very helpful to discuss. Yes, my info on caching looks to be out of date now; (and likely has been for some time!) COPY didn't use to compare checksums first, and it looks like external build scripts are now encouraged instead of discouraged. Looks like modern Docker is also more flexible about build context vs Dockerfile location, making it easier for multiple Dockerfiles to share a given build script. These shifts is significant -- for instance, I've often thought it would be nice to have the external build-script approach you describe since it could potentially facilitate building the stack in different orders -- e.g. r-ver + tidyverse without rstudio -- though some care would need to be taken in implementing this.

Re the tags, yes, the code for older versions would be organized as git tags instead of subdirectories, and thus somewhat more immutable. Does that pattern make more sense than subdirectories? (Another alternative is to use git branches rather than sub-directories -- I do this in rocker/geospatial only for the :devel tag, which lives perpetually on a separate devel branch, while all others are tagged versions along master branch.)

Regarding what to do about the old tags, I think I could be talked into either alternative -- first migrating all the old images to use script-based builds and then re-creating tags for each version, or just leaving them as they are while adopting the scripted approach going forward. Certainly the latter is less effort and avoids the unlikely but not impossible chance of making a breaking change on a supposedly-stable configuration. What do you think?

I'm happy to experiment more on geospatial first where implementing an external build env would be pretty trivial, or starting with a PR to r-ver if you just want to jump in there. Again, thanks for engaging on this and really appreciate your insight here.

eitsupi commented 2 years ago

I think this issue has been resolved, so I will close it.