Closed wlandau closed 3 months ago
Maybe it is in the description but not included in the PACKAGES.tar.gz index of the repository.
Ill have a look tonight.
Op di 5 mrt. 2024 14:10 schreef Will Landau @.***>:
Actually, I do see the hashes are already in the DESCRIPTION:
install.packages("gh", repos = "https://r-releases.r-universe.dev") packageDescription("gh")$RemoteSha#> [1] "ab056d6322064295432d4e9c08143c2c99c028e4"
So it's odd that availablePackages() does not show it.
— Reply to this email directly, view it on GitHub https://github.com/r-universe-org/help/issues/377#issuecomment-1978858338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUZ73B6EDVWDKJFJSF33LYWXG5TAVCNFSM6AAAAABEHGRKJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYHA2TQMZTHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
So these are fields that are included in the individual DESCRIPTION: https://jeroen.r-universe.dev/jsonlite/DESCRIPTION
But only a few of them are included in the index (to save space): https://jeroen.r-universe.dev/src/contrib/PACKAGES
I see. Would it be feasible to add RemoteSha to PACKAGES to support https://github.com/r-releases/help/issues/21 and #149, or is a light PACKAGES file more of a priority for R-universe? In the latter case, what would you recommend for https://github.com/r-releases/help/issues/21?
Perhaps we can make it opt-in via a parameter. Does it really need to work with base R available.packages()
or are you more flexible? We could also include it just for the JSONLD index only e.g.: https://jeroen.r-universe.dev/src/contrib/
So then instead of base available.packages()
you would need to use e.g.
df <- jsonlite::stream_in(url("https://jeroen.r-universe.dev/src/contrib/"), verbose = F)
Does it really need to work with base R available.packages() or are you more flexible?
I am flexible. I am good with anything that pulls the package names, version numbers, and RemoteShas of all packages quickly.
We could also include it just for the JSONLD index only e.g.: https://jeroen.r-universe.dev/src/contrib/. So then instead of base available.packages() you would need to use e.g...
Perfect!
Not sure if it's a good idea to deploy from my flight but here is something you can test now:
https://jeroen.r-universe.dev/src/contrib/PACKAGES?fields=RemoteSha,RemoteUrl
https://jeroen.r-universe.dev/src/contrib/PACKAGES.json?fields=RemoteSha,RemoteUrl
So using this fields
parameter you can request any additional fields (comma separated and case sensitive) from the DESCRIPTION files in the PACKAGES index.
Cool! Your query parameter idea looks like an elegant way to handle this, and it works for me in both cases:
system.time(
packages_file <- utils::available.packages(
contriburl = paste0(
contrib.url("https://jeroen.r-universe.dev", type = "source"),
"/PACKAGES?fields=RemoteSha,RemoteUrl"
),
fields = "RemoteSha"
)
)
#> user system elapsed
#> 0.033 0.010 2.086
head(packages_file[, "RemoteSha"])
#> RAppArmor
#> "f437c1a926e7f5c225003738bca46584ee1a1f51"
#> V8
#> "8adfc4c5ffc1f2da45206a53927d14046dfaa141"
#> badgen
#> "57af6a1eab06369730a9ca520375ed6b78a0e5d6"
#> base64
#> "0b8294d5d2ea1f1d1d069ef5ff681d90bdbc38ab"
#> bcrypt
#> "49eb9da001cc6d3f118521d6e5221fb8909cfa6e"
#> brotli
#> "00a9aa6a84cfcf2da6184a32a0ce7a7f1b9a8211"
system.time(
json <- jsonlite::stream_in(
url("https://jeroen.r-universe.dev/src/contrib/PACKAGES.json?fields=RemoteSha,RemoteUrl"),
verbose = FALSE
)
)
#> user system elapsed
#> 0.036 0.003 0.858
head(json$RemoteSha)
#> [1] "f437c1a926e7f5c225003738bca46584ee1a1f51"
#> [2] "8adfc4c5ffc1f2da45206a53927d14046dfaa141"
#> [3] "57af6a1eab06369730a9ca520375ed6b78a0e5d6"
#> [4] "0b8294d5d2ea1f1d1d069ef5ff681d90bdbc38ab"
#> [5] "49eb9da001cc6d3f118521d6e5221fb8909cfa6e"
#> [6] "00a9aa6a84cfcf2da6184a32a0ce7a7f1b9a8211"
Created on 2024-03-05 with reprex v2.1.0
I noticed the query also works in the R-releases universe too: https://r-releases.r-universe.dev/src/contrib/PACKAGES?fields=RemoteSha,RemoteUrl. Okay if I use it in R-releases? Would you still rather me use the JSON route, or is PACKAGES/available.packages()
okay too?
Yes go for it, I was only mentioning mine as example. The API is the same for any universe of course.
Can I close this as solved?
Certainly! Thank you for your help.
Motivation
@gmbecker mentioned how important it is for users to be able to trust the versions numbers of packages. For R-releases, we will not impose any pre-release gatekeeping, but @shikokuchuo and I are working on a service that checks all the versions and hashes and reports which packages are not in compliance. We are having trouble building this service given what we currently know about R-universe. C.f. https://github.com/r-releases/help/issues/21.
Implementation in R-releases
In https://github.com/r-releases/r.releases.utils/pull/9 and https://github.com/r-releases/r-releases.r-universe.dev/pull/6, I wrote a service that runs once a day and gets the version and hash of every package in the universe. Every time the service runs, it keeps track of the highest version number ever released, as well as the hash of that release. We want it to flag a package for non-compliance if:
These non-compliant packages are written to a small file
version_issues.json
, which either Gabe's "safe" repo or "install_safe()" could leverage for choosing which packages are safe to install.Challenge
We are having trouble getting reliable hashes.
utils::available.packages(repos = "https://r-releases.r-universe.dev", fields = "RemoteSha")
is fast, but it returnsNA
s forRemoteSha
. And as @shikokuchuo mentioned, MD5s are brittle because R-universe rebuilds the current version periodically with potentially different metadata.The API for https://r-releases.r-universe.dev/api/packages/ returns information for multiple packages, but the payload is large, and not all packages may be returned. (https://cran.r-universe.dev/api/packages/ shows only a few hundred.) Hitting the API for each package individually is slow, and I am concerned it may overburden R-universe.
Proposal
Would it be possible to include the GitHub SHA in the
RemoteSha
field of ~theDESCRIPTION
file for packages built on R-universe~ thePACKAGES
file of each universe, such as https://r-releases.r-universe.dev/src/contrib/PACKAGES? That way, unless I am missing something,available.packages()
should work with https://github.com/r-releases/help/issues/21, and it may even make the end product of #149 more trustworthy.(I'm not sure whether https://r-releases.r-universe.dev/src/contrib would have include that field too.)