r-multiverse / help

Discussions, issues, and feedback for R-multiverse
https://r-multiverse.org
MIT License
2 stars 2 forks source link

Package version best practices #21

Closed wlandau closed 3 months ago

wlandau commented 3 months ago

In https://github.com/r-universe-org/help/issues/363, @gmbecker mentioned it is important that users be able to trust the version number of a package. If a new release of a package is published, then its version number should always increment.

It is straightforward to list all the version numbers and MD5 hashes of all the packages hosted at https://r-releases.r-universe.dev.

suppressPackageStartupMessages(library(dplyr))
library(tibble)
available.packages(repos = "https://r-releases.r-universe.dev") %>%
  as_tibble() %>%
  rename(hash = MD5sum) %>%
  rename_with(tolower) %>%
  select(all_of(c("package", "version", "hash")))
#> # A tibble: 113 × 3
#>    package   version hash                            
#>    <chr>     <chr>   <chr>                           
#>  1 BaseSet   0.9.0   027bcf49db2279a0f13170f9149a2a1c
#>  2 BioCor    1.7.0   a17637da3164d60c1df3563386cc8d6c
#>  3 Matrix    1.6-5   ab11095f57536c212af541bb81321105
#>  4 R6        2.5.0   d03c26e0f56c0406976a9b0d4744b11a
#>  5 Rcpp      1.0.12  31c18d8690068e80f36738a59b6f3a38
#>  6 askpass   1.2.0   2a75324b61f52f7dabd6ce939ce22af7
#>  7 base64enc 0.1-3   c5ff8d23d40bcd542cd5dc2548d5d756
#>  8 bit       4.0.5   ef05f46d4c2a1edfa35ef3d78c6eab74
#>  9 bit64     4.0.5   e48a9678044d06337be9d10534ef273a
#> 10 brew      1.0-10  c2282974f7d74ce3100494b619cd023f
#> # ℹ 103 more rows

Created on 2024-03-04 with reprex v2.1.0

As part of https://github.com/r-releases/r-releases.r-universe.dev/blob/main/.github/workflows/build_universe.yaml, we could pull this information and cache it as a new file in https://github.com/r-releases/r-releases.r-universe.dev. During the caching process, we could compare the current versions/hashes to the previously versions/hashes and make a judgement about version compliance. Then in #6 or #10, we could use this information to recommend which packages are safe to install.

wlandau commented 3 months ago

I will implement this, hopefully this week or next week, as I think the r.releases.utils package is the place for it.

wlandau commented 3 months ago

Actually, maybe this exists on its own as another function in the same package as #6, but not as part of install_safe().

wlandau commented 3 months ago

The manifest should include:

  1. The current release version.
  2. The highest version ever released.
  3. The current release MD5 sum.
  4. The MD5 sum of the highest version ever released.

Then a separate function in the package from #6 could pull the manifest file and decide everything it needs to know from there. For good versioning practices, (1) and (2) should agree, and (3) and (4) should agree.

wlandau commented 3 months ago

(3) and (4) could be obtained from the previous manifest on each iteration.

wlandau commented 3 months ago

To make #6 easier, I will write a separate JSON with just the package listings with version issues. This should be a small enough list for releases::check_releases() (or whatever we call that function) to download the whole thing quickly.

shikokuchuo commented 3 months ago

Btw. @wlandau just a note that you might need to use the remote sha of the Github commit (also returned by the R-universe API) as if R-universe is constantly re-building on a periodic basis, then the MD5 sum of each package will presumably differ due to the date (metadata contained in the package).

wlandau commented 3 months ago

Do you know how to get those GitHub SHAs from the R-universe API? Unfortunately available.packages() always returns me NAs for the RemoteSha field, which is the only reason I first used MD5s.

shikokuchuo commented 3 months ago

Oh I see, yes that would be the MD5 sum of the built package as I understand it. So each build could return a different one.

The R-universe API is the one for each package like: https://r-releases.r-universe.dev/api/packages/mirai It returns one combined json payload, which would have to be parsed for the 'RemoteSha'.

wlandau commented 3 months ago

I was afraid these would have to be pulled one-by-one. nanonext::ncurl("https://cran.r-universe.dev/api/packages") gets multiple packages, but only a small subset of 315. I wonder if some kind of pagination or other workaround is possible here.

wlandau commented 3 months ago

I tried hitting the API for each package, but I stopped it at 20 packages because it was clear the speed would not scale for our purposes.

So for the development of r-releases, I propose that for now, we only flag versions the decrement, as oppose to ones that release without incrementing. We don't need the hash for that part. This way, we can at least provide something that #6 can build on.

I will also see if RemoteSha can be added to the DESCRIPTION of built packages in R-universe.

wlandau commented 3 months ago

So for the development of r-releases, I propose that for now, we only flag versions the decrement, as oppose to ones that release without incrementing. We don't need the hash for that part. This way, we can at least provide something that https://github.com/r-releases/help/issues/6 can build on.

And with that, https://github.com/r-releases/r.releases.utils/pull/9 and https://github.com/r-releases/r-releases.r-universe.dev/pull/6 are now ready for review.

shikokuchuo commented 3 months ago

ncurl_aio() might be a better choice here whereby you could try getting say 100 concurrently. I'm not sure if that would help - the individual json payloads would still be quite large. But sequential downloads would be too slow due to the network latency - this overrides any other factor.

wlandau commented 3 months ago

We might get results faster, but I worry this may overburden the API. I will ask Jeroen.

wlandau commented 3 months ago

Just submitted https://github.com/r-universe-org/help/issues/377. I would prefer to decide on https://github.com/r-releases/r.releases.utils/pull/9 and https://github.com/r-releases/r-releases.r-universe.dev/pull/6 based on the discussion in that thread.

wlandau commented 3 months ago

I think including RemoteSha would have to be included in https://r-releases.r-universe.dev/src/contrib/PACKAGES and/or https://r-releases.r-universe.dev/src/contrib, and then available.packages() should work.

wlandau commented 3 months ago

Thanks to @jeroen's work on https://github.com/r-universe-org/help/issues/377, we are now able to reliably get the RemoteSha field. I opened a pull request at https://github.com/r-releases/r.releases.utils/pull/11.

wlandau commented 3 months ago

Solved by https://github.com/r-releases/r.releases.internals/pull/11 and and especially https://github.com/r-universe-org/help/issues/377.