Closed eitsupi closed 3 years ago
thanks, yeah, automating the updates for a new release would be brilliant. interested in prepping PR for this?
Fixed date to 2021-05-17
Yes, of course I would like to contribute with PR. I think I can provide the following files at this time.
However, I don't yet know how to use that file to generate stacks
json files.
So you need to manually post the CRAN URLs, etc. from the generated file to the stacks
files, is that OK?
@eitsupi very nice. I don't think it would be all that tricky to go from your R script generating the versions and CRAN URL to one that updates the stack files?
Still, our current stack file pattern which requires a new .json
file be created at each new release is still rather cumbersome, with a lot of duplication. It would be preferable for a stack file to simply have an array for the different R versions, instead of a new stack file for each version. This merely needs a good way of distinguishing between env vars etc that remain fixed over each new release vs those like the CRAN URL that need to be updated. Of course the R script that generates the Dockerfiles from the stacks would need updating to the new syntax. In that manner, a new release would be fully automated while also streamlining the config file situation a little more.
I think that's quite do-able but haven't had a chance to carve out the time!
@cboettig I agree that it is best to separate the variable part of the stack file from the fixed part, and when upgrading, update only the one file that contains the variable.
Probably the easiest way is to give the build args to the ARG
on the Dockerfile when docker build, but that's different from the current build system which has a separate dockefile for each image. I also don't know if DockerHub supports such a build system; we can use GitHubActions to build it (if it doesn't time out...).
First of all, we need to be able to generate variables automatically, so let's focus on core-4.0.0.json
, and the variables are the following parts.
"TAG": "4.0.0"
"FROM": "ubuntu:20.04"
"R_VERSION": "4.0.0"
"CRAN": "https://packagemanager.rstudio.com/cran/__linux__/focal/291"
"FROM": "rocker/r-ver:4.0.0"
"S6_VERSION": "v2.0.0.1"
"RSTUDIO_VERSION": "1.3.959"
"FROM": "rocker/rstudio:4.0.0"
"FROM": "rocker/tidyverse:4.0.0"
"CTAN_REPO": "http://www.texlive.info/tlnet-archive/2020/06/05/tlnet"
Currently, the things I can't generate using the above procedure and have to set manually are S6_VERSION
, RSTUDIO_VERSION
and CTAN_REPO
.
(Also, ubuntu:20.04
should be set to ubuntu:focal
and CRAN
should use a date-based format)
It seems that CTAN_REPO
is a date-based URL and is generated daily, so it can be generated automatically.
I think S6_VERSION
and RSTUDIO_VERSION
can be easily generated automatically by getting the release information from GitHub, so I'll check how to do it.
Update RStudio 1.4.1106 has a release date of 2021-02-11 available on GitHub, but it seems that the binary release date was 2021-03-02, so the date available on GitHub cannot be used.
I don't think it would be all that tricky to go from your R script generating the versions and CRAN URL to one that updates the stack files?
I thought there would be more parts that I would need to configure manually, but it certainly looks like the core stack file can be generated automatically.
Looking at other definitions, the only other variables I could find were CUDA_VERSION
and NCCL_VERSION
, which are used in ml-cuda. I do not know where these two come from .......
Thanks.
I also don't know if DockerHub supports such a build system; we can use GitHubActions to build it (if it doesn't time out...).
Starting with 4.x / versioned2
we stopped using the DockerHub automated builds in the versioned stack anyway, since they do not support large numbers of tags, literally could not add more versions to rocker/r-ver
automated build. So our builds are already local and/or GitHub Actions (where build timeout is not usually an issue (particularly with RSPM binaries), but network failures on deploy to DockerHub and image sizes are a real issue; runners are too small to build the ml stack and some others.
Currently, the things I can't generate using the above procedure and have to set manually are S6_VERSION, RSTUDIO_VERSION and CTAN_REPO.
We've actually kept S6_VERSION mostly locked, upgrading as needed only -- it hasn't always been safe to upgrade S6_VERSION without sufficient testing since changes there can change the way the init config files work. Automating the CTAN repo to the date-based archive snapshots should be fine, note the archive should only be used for frozen images and not current release since it has limited capacity. There's scripts in littler
and in install_rstudio.sh
I think that show getting the latest R version (not from GitHub release info).
I thought there would be more parts that I would need to configure manually, but it certainly looks like the core stack file can be generated automatically [...] only other variables I could find were CUDA_VERSION and NCCL_VERSION,
Yeah, happy to start with core stack. The ML stack variables are fixed based on the corresponding locks in nvidia/cuda, but all of that is still a bit of a work in progress and looks like we can do some more sync-ing up there.)
Starting with 4.x /
versioned2
we stopped using the DockerHub automated builds in the versioned stack anyway, since they do not support large numbers of tags, literally could not add more versions torocker/r-ver
automated build. So our builds are already local and/or GitHub Actions (where build timeout is not usually an issue (particularly with RSPM binaries), but network failures on deploy to DockerHub and image sizes are a real issue; runners are too small to build the ml stack and some others.
I was wondering why the images was being updated when GitHubActions was not running, does that mean it was being built locally? Thank you for all your hard work.
If we only consider building with GitHubActions, I think we can use multi-stage build and build matrix to build all images with a single Dockerfile and a json file with variables. (You may need to split your workflow into multiple workflows to get around the GitHubActions limitation, but in any case, you'll have very few files to maintain.)
We've actually kept S6_VERSION mostly locked, upgrading as needed only -- it hasn't always been safe to upgrade S6_VERSION without sufficient testing since changes there can change the way the init config files work.
OK, it looks like S6_VERSION
should be hard-coded and updated manually.
Automating the CTAN repo to the date-based archive snapshots should be fine, note the archive should only be used for frozen images and not current release since it has limited capacity.
As with CRAN
, I will make sure that the latest version is set to a dedicated value.
There's scripts in
littler
and ininstall_rstudio.sh
I think that show getting the latest R version (not from GitHub release info).
Since the date used for the variable is fixed after the next version of R is released, I think we need to generate the variable for at least the last two versions. In other words, we need to get the latest version number of RStudio for the past R release date. (It is possible to get the latest version number of RStudio on the release date by checking the version only when the release date of R matches the date when the script was executed, but it is not desirable to miss that date because it will not work properly.)
I would like to try to determine the RStudio version from the GitHub release information and use the previous version if the binary has not been released yet.
The disadvantage of this method is that the RSTUDIO_VERSION
of the old images may be rewritten when the RStudio binary is released, but I don't think this will be a problem since it is hard to imagine that RStudio does not support the last two versions of R.
Since the date used for the variable is fixed after the next version of R is released, I think we need to generate the variable for at least the last two versions
right, good call. yeah this sounds fine, I don't think overwriting the old RSTUDIO_VERSION should be an issue, though hopefully most of the time it will be the same version as previously(?)
does that mean it was being built locally?
Correct. runs on one of my servers so could be automated by CRON job easily but is not at the moment. Ideally we'd still build from GH-Actions cron jobs as much as possible (i.e. at least for the smaller images), so it would be great for an automated pipeline to bump the versions on the gh-actions config files too (or maybe better, switch those files over to using methods from the Makefile
so they can just call something like make core-latest
instead)
Correct. runs on one of my servers so could be automated by CRON job easily but is not at the moment. Ideally we'd still build from GH-Actions cron jobs as much as possible (i.e. at least for the smaller images)
Thank you very much for using your own server to build images! It's very easy to set up scheduled execution for GitHubActions, so it seems good idea to do so right away, especially for daily build "devel". Just write something like this. https://github.com/eitsupi/r-ver/blob/cc7f15dbcb644c5b73e7534159f1a6c309576db3/.github/workflows/docker-build-push.yml#L1-L8
it would be great for an automated pipeline to bump the versions on the gh-actions config files too
Updating the workflow definition file itself is good, but I think an easier way is to generate a matrix based on an external json file from the workflow and reference the variables in the workflow. This can be achieved as follows. https://github.com/eitsupi/r-ver/blob/cc7f15dbcb644c5b73e7534159f1a6c309576db3/.github/workflows/docker-build-push.yml#L19-L30
The dynamic matrix generated from json is described in the following post. https://github.blog/changelog/2020-04-15-github-actions-new-workflow-features/
By the way, I've never used GNU Make, so I may need to study it......
Yup, cron triggers have been on the to-do list for a while, https://github.com/rocker-org/rocker-versioned2/issues/13, but never pulled the trigger since most of those gh-actions builds haven't been all that stable. Starting with the devel
build on CRON makes :100: sense though!
Your build-matrix looks great, very clever! (will take me a while to quite wrap my head around it though). Definitely seems the way to go, would streamline the gh-actions setup a lot.
My new script is now able to generate RSTUDIO_VERSION
automatically.
(Due to the GitHub API specification, we can't use it anonymously multiple times in a row.)
I'll try to convert it to stack files this weekend.
I'm closing this issue because we've achieved the original goal of automatically updating the CRAN URL for rocker/r-ver
Dockerfile.
I have created a new issue #181 that covers automation more broadly.
The disadvantage of this method is that the
RSTUDIO_VERSION
of the old images may be rewritten when the RStudio binary is released, but I don't think this will be a problem since it is hard to imagine that RStudio does not support the last two versions of R.
Note: This actually occurred with the RStudio version 2022.02.2+485
released today. (#433)
Currently, the configuration of the CRAN URL is done manually, and there have been several misconfigurations in the past. #127 #141 (By the way, the CRAN URL for 4.0.5 is currently set to May 19, 2021, but since 4.1.0 was released on May 18, wouldn't it be better to re-set it to May 17?)
I think the following steps to determine the CRAN URL can be automated, and GitHub Actions may be used to automatically update the URL.
We can check the release date of R by referring directly to the R SVN repository with the
rversions::r_versions
function.The release dates and codenames of Ubuntu can be found in the
/usr/share/distro-info/ubuntu.csv
file included with Ubuntu.The existence of the CRAN URL can be checked by using the
pak::repo_ping
function included in the development version of the pak package.I was able to use GitHub Actions to automatically update the CRAN URLs in my repository, which consolidated the management of the URLs into a single file. https://github.com/eitsupi/r-ver/pull/57/files