r-multiverse / help

Discussions, issues, and feedback for R-multiverse
https://r-multiverse.org
MIT License
2 stars 2 forks source link

A potential compromise: automated submissions with revdep checks #3

Closed wlandau closed 4 months ago

wlandau commented 4 months ago

Given https://github.com/r-universe-org/help/issues/363#issuecomment-1971854389 and @gmbecker's comments above, I would like to sketch a potential change in direction to help r-releases have the best of both worlds. I think it has major advantages:

  1. It won't accept packages until all checks and revdep checks pass.
  2. It won't accept packages unless the package version is incremented.
  3. It does not require R-universe to host its own revdep checks.
  4. It is protected against contributors deleting their releases.
  5. This would not add extra manual review.

Initiating pull requests

Scanning pull requests

During manual review, if the PR looks good, do not merge it. Instead, add a special label that allows the bot to run check_package.yaml.

Checking packages

scan_pull_requests.yaml triggers check_package.yaml. Both are in https://github.com/r-releases.r-universe.dev, and both run as the current r-releases-app bot, so the former can send an informative payload to the latter. Steps in check_package.yaml:

  1. Check that the JSON entry in the pull request is well-formed.
  2. Check that the branch field in the JSON increments the package version.
  3. Check that the release actually exists.
  4. Run the R CMD check on all platforms that R-universe uses.
  5. Run reverse dependency checks on all platforms that R-universe uses.
  6. If any of the above checks fail, make an informative comment in the PR and automatically close it.
  7. If all the above pass, push the source code of the release to a special r-releases archive repo, then call the API to merge the pull request and make a comment.

The archive repo

We can't entirely trust the original repo to host the user's original release permanently. The user has full control, so they could delete any releases they want over there. So before the pull request is merged, I propose that the release is copied to a special archive repo, possibly in a different GitHub org such as r-releases-archive. So a PR review would like like this, in this order:

  1. Check https//github.com/USER/REPO/releases/tag/v2.0.0.
  2. Copy https//github.com/USER/REPO/releases/tag/v2.0.0 to the more trusted https//github.com/r-releases-archive/REPO/releases/tag/v2.0.0.
  3. After the release is safely archived, actually merge the pull request to update the listing in https://github.com/r-releases/r-releases.

Where R-universe comes in

The text files in https://github.com/r-releases/r-releases can build the universe just like in the current prototype.

Deletions

A user may request to delete a package in a PR. If the package does have reverse dependencies, the bot should close the PR and ask the maintainer to contact the maintainer of the revdeps to have those revdeps taken down first. If the package can be safely removed, then the text file listing at https://github.com/r-releases/r-releases is deleted, but the archived repo still remains under https//github.com/r-releases-archive

Challenges

  1. I have never triggered a GitHub actions job from another GitHub actions job. I think it's possible, I just don't know how at the moment.
  2. In the "The archive repo" section above, it is possible that (2) succeeds and (3) fails. It is rare, but we would still need to think about how to handle that.
wlandau commented 4 months ago

More challenges:

  1. What if a new version of R breaks a bunch of packages? Is the requirement of clean checks on submission good enough for this situation to take care of itself?
  2. What about the catch-22 of trying to roll out breaking changes to a dependency and its revdep together? I have been blocked in the past and have had to temporarily suppress all testing to get on CRAN. Can we resolve this if both packages are updated in the same PR?
wlandau commented 4 months ago

For the first prototype of r-releases, we rushed to get the implementation working. For this part, let's pause and think for a while, and if we pursue it, work on it slowly. The current r-releases is good enough to demonstrate the concept and make it concrete. Even just knowing how to use the bot helps tremendously.

wlandau commented 4 months ago

Using an archive in the backend might even solve subdirectory issues like https://github.com/r-universe-org/help/issues/367.

wlandau commented 4 months ago

On second thought: if we go this route, it is different enough that it should be in its own org, maybe r-production instead of r-releases. The current r-releases has value in the way it works right now: it strikes a middle ground between development and production. I definitely don't want to see this rendition of r-releases go away.

wlandau commented 4 months ago

We will be even more informed after r-releases officially rolls out for what it is and we observe the consequences of our actions impact of r-releases over potentially several months. Baby steps.

wlandau commented 4 months ago

We will be even more informed after r-releases officially rolls out for what it is and we observe the consequences of our actions over potentially several months. Baby steps.

What I mean to say was that the combination of r-releases and r-universe has the potential to give us data that we never had before, especially if r-universe gets revdep checks and a cron-like schedule for all the checks it supports. And because r-universe is all about discoverability, we can get a sense of how often package compatibility issues really happen, along with potentially new ways to mitigate that.

wlandau commented 4 months ago

And as for my original post in https://github.com/r-releases/help/issues/3#issue-2162110870: I think a lot can be simplified in that route. Maybe a package maintainer registers a commit URL, which triggers some checks on GitHub actions, which then archives the source code of that commit in a read-only repo in https://github.com/revdep-checked-package-releases-or-something, which can then be deployed in an r-universe. The combination of GitHub Actions, GitHub apps, and r-universe seems like enough automation to replicate the best parts of CRAN if we really wanted to. But I would like to wait and see how much if it we really need.

wlandau commented 4 months ago

Hmm... from https://github.com/r-universe-org/help/issues/363#issuecomment-1972105292 and https://github.com/r-universe-org/help/issues/363#issuecomment-1972346670, maybe none of the fancy Rube Goldberg-level automation I proposed in https://github.com/r-releases/help/issues/3#issue-2162110870 will ever become necessary.

wlandau commented 4 months ago

6 is a dramatically simpler and uniformly superior approach to ensure package quality and user-end confidence in r-releases. Closing in favor of #6.