opensafely-core / r-docker

Docker image for running R scripts in OpenSAFELY
1 stars 3 forks source link

Add butcher package #128

Open inglesp opened 1 year ago

inglesp commented 1 year ago

See #127 for details.

bloodearnest commented 1 year ago

I think the posit repo is maybe just a frozen cran mirror, so maybe its ok?

I am wary of date based pinning of older versions. I actually think the first suggestion of adding dependencies manually is cleaner from a maintenance pov

remlapmot commented 1 year ago

Using snapshot repos is the easiest way to add a new package without updating its dependencies that I know of. If a new package had a lot of dependencies, working out the order and version in which to include those could take some figuring out.

So for me this looks good.

That's right, the RStudio/Posit CRAN snapshots are CRAN at a certain date, like the Microsoft CRAN snapshot repos we used to add dtwclust and randomForest. https://github.com/opensafely-core/r-docker/blob/2f2498a46bc5bdb2e5c8aa59a6844794dd2c4439/packages.txt#L106-L107 Sadly, soon Microsoft are shutting down their CRAN snapshotting service, the RStudio/Posit snapshot repos are the only public large scale snapshotting service left that I know of. I gave the URL to the Posit source package snapshot repo rather than the URL to their Focal binary package snapshot repo because there's a chance they might have used different system libraries.

As we chatted about before the most recent syntax additions to R came in R 4.1.0, so if you were to do a new image, say with R 4.1.3 that could include updated versions of the packages. My guess is that for probably all the packages their current CRAN version will install into R 4.1.3. If there are any that require say R 4.2.0 then you'd have to go back a version or two for those packages.

bloodearnest commented 1 year ago

Using snapshot repos is the easiest way to add a new package without updating its dependencies that I know of. If a new package had a lot of dependencies, working out the order and version in which to include those could take some figuring out.

So for me this looks good.

That's right, the RStudio/Posit CRAN snapshots are CRAN at a certain date, like the Microsoft CRAN snapshot repos we used to add dtwclust and randomForest.

https://github.com/opensafely-core/r-docker/blob/2f2498a46bc5bdb2e5c8aa59a6844794dd2c4439/packages.txt#L106-L107

Sadly, soon Microsoft are shutting down their CRAN snapshotting service, the RStudio/Posit snapshot repos are the only public large scale snapshotting service left that I know of. I gave the URL to the Posit source package snapshot repo rather than the URL to their Focal binary package snapshot repo because there's a chance they might have used different system libraries.

Ok, I yield to your in depth R ecosystem knowledge :)

As we chatted about before the most recent syntax additions to R came in R 4.1.0, so if you were to do a new image, say with R 4.1.3 that could include updated versions of the packages. My guess is that for probably all the packages their current CRAN version will install into R 4.1.3. If there are any that require say R 4.2.0 then you'd have to go back a version or two for those packages.

Yep, my vague plan had been to install all the same packages as this image, but at whatever version is current when we initially publish the image. And then sunset this image, freezing the package additions, and only supporting new packages on the newer image.