Closed wlandau closed 3 months ago
I will implement this, hopefully this week or next week, as I think the r.releases.utils
package is the place for it.
Actually, maybe this exists on its own as another function in the same package as #6, but not as part of install_safe()
.
The manifest should include:
Then a separate function in the package from #6 could pull the manifest file and decide everything it needs to know from there. For good versioning practices, (1) and (2) should agree, and (3) and (4) should agree.
(3) and (4) could be obtained from the previous manifest on each iteration.
To make #6 easier, I will write a separate JSON with just the package listings with version issues. This should be a small enough list for releases::check_releases()
(or whatever we call that function) to download the whole thing quickly.
Btw. @wlandau just a note that you might need to use the remote sha of the Github commit (also returned by the R-universe API) as if R-universe is constantly re-building on a periodic basis, then the MD5 sum of each package will presumably differ due to the date (metadata contained in the package).
Do you know how to get those GitHub SHAs from the R-universe API? Unfortunately available.packages()
always returns me NAs for the RemoteSha field, which is the only reason I first used MD5s.
Oh I see, yes that would be the MD5 sum of the built package as I understand it. So each build could return a different one.
The R-universe API is the one for each package like: https://r-releases.r-universe.dev/api/packages/mirai It returns one combined json payload, which would have to be parsed for the 'RemoteSha'.
I was afraid these would have to be pulled one-by-one. nanonext::ncurl("https://cran.r-universe.dev/api/packages")
gets multiple packages, but only a small subset of 315. I wonder if some kind of pagination or other workaround is possible here.
I tried hitting the API for each package, but I stopped it at 20 packages because it was clear the speed would not scale for our purposes.
So for the development of r-releases
, I propose that for now, we only flag versions the decrement, as oppose to ones that release without incrementing. We don't need the hash for that part. This way, we can at least provide something that #6 can build on.
I will also see if RemoteSha can be added to the DESCRIPTION
of built packages in R-universe.
So for the development of r-releases, I propose that for now, we only flag versions the decrement, as oppose to ones that release without incrementing. We don't need the hash for that part. This way, we can at least provide something that https://github.com/r-releases/help/issues/6 can build on.
And with that, https://github.com/r-releases/r.releases.utils/pull/9 and https://github.com/r-releases/r-releases.r-universe.dev/pull/6 are now ready for review.
ncurl_aio() might be a better choice here whereby you could try getting say 100 concurrently. I'm not sure if that would help - the individual json payloads would still be quite large. But sequential downloads would be too slow due to the network latency - this overrides any other factor.
We might get results faster, but I worry this may overburden the API. I will ask Jeroen.
Just submitted https://github.com/r-universe-org/help/issues/377. I would prefer to decide on https://github.com/r-releases/r.releases.utils/pull/9 and https://github.com/r-releases/r-releases.r-universe.dev/pull/6 based on the discussion in that thread.
I think including RemoteSha would have to be included in https://r-releases.r-universe.dev/src/contrib/PACKAGES and/or https://r-releases.r-universe.dev/src/contrib, and then available.packages()
should work.
Thanks to @jeroen's work on https://github.com/r-universe-org/help/issues/377, we are now able to reliably get the RemoteSha field. I opened a pull request at https://github.com/r-releases/r.releases.utils/pull/11.
Solved by https://github.com/r-releases/r.releases.internals/pull/11 and and especially https://github.com/r-universe-org/help/issues/377.
In https://github.com/r-universe-org/help/issues/363, @gmbecker mentioned it is important that users be able to trust the version number of a package. If a new release of a package is published, then its version number should always increment.
It is straightforward to list all the version numbers and MD5 hashes of all the packages hosted at https://r-releases.r-universe.dev.
Created on 2024-03-04 with reprex v2.1.0
As part of https://github.com/r-releases/r-releases.r-universe.dev/blob/main/.github/workflows/build_universe.yaml, we could pull this information and cache it as a new file in https://github.com/r-releases/r-releases.r-universe.dev. During the caching process, we could compare the current versions/hashes to the previously versions/hashes and make a judgement about version compliance. Then in #6 or #10, we could use this information to recommend which packages are safe to install.