wlandau commented 1 month ago

For when https://production.r-multiverse.org goes live.

wlandau commented 1 month ago

My comment from another thread:

I think the biggest potential problem with the dual-repo option is that a package could suddenly get archived because of a spurious or intermittent test result, which would could disrupt and surprise users and developers. What if we implemented a grace period? Whether it's 2 weeks or 4 weeks or something else, this could add stability to https://r-strict.r-universe.dev/. And I do not think it would be hard to implement.

@shikokuchuo's reply:

Agree. Otherwise it would be too noisy. What if we started off mimicking current CRAN practice, but not setting these grace periods in stone. We could then move in either direction and see what kind of feedback we get.

wlandau commented 1 month ago

With respect to production, a package has many failure modes:

The package is new and not yet in production.
The new version in QA is broken, but the older production version is still okay.
The new version in QA passes its own tests, but it breaks reverse dependencies. Again, let's say the old version is fine.
Whether because base R changed or a reverse dependency changed, a previously healthy package breaks.

On first glance at scenarios (2) and (3), my first thought was to keep the old version in production indefinitely even if the new QA version has problems. But that is a bad idea because:

If we keep an old version in production but the new one in QA, it puts prd and QA out of sync. We want prd to be a simple subset of QA, not some strange old stale version of QA.
We want to rely exclusively on the checks in QA without having to check the checks in prd. Version mismatches would cause all sorts of problems there.
If the old version stays in QA and the new version is in production, then the maintainer incentives are all wrong. It's easy to get too relaxed in prd still succeeds. Maintainers should have incentives to keep the most current versions of their packages working correctly.

Other options:

A. Keep the old version in production for a grace period of 2-4 weeks, just to protect users against breaking changes while problems get fixed. Then remove the package from production if those problems have not been fixed in those 2-4 weeks. B. Same as (A), but put the new version in production during the grace period to keep QA and prd synchronized. This could help us avoid subtle disagreements between QA and prd.

shikokuchuo commented 1 month ago

On a theoretical level I agree with all your analysis above.

I'd be in favour of B, however we could now have a package in production that breaks other packages in production.

So I think in practice, if we're to avoid an even more complicated system with a 'staging' repo, we have to accept the new version in QA, and then it only migrates into the prod universe when all checks pass.

If things don't get sorted then there is no need to remove the old version from the prod universe, as by definition everything works with everything else in that universe. Things will just persist. This is the only way to prevent the whole auto-archival mess.

I think this means we'd have to entertain requests for breakage - which would be a job for moderators.

wlandau commented 3 weeks ago

Reflecting on this more, I vote for option B.

I'd be in favour of B, however we could now have a package in production that breaks other packages in production.

I think this is the lesser of two evils. Not only does it keep the universes synchronized, it also incentivizes maintainers to favor the new version and upgrade the revdeps. By contrast, if we have the new version in QA and the old version in prod, then revdeps have to be compatible with both, which is not always possible. This is exactly the sort of thing that incentivizes maintainers to give up and skip all their tests (c.f. https://github.com/rstudio/rmarkdown/blob/ee69d59f8011ad7b717a409fcbf8060d6ffc4139/tests/testthat.R#L2).

When I shared R-multiverse (back when we called it "R-releases") in that R Consortium meeting a few months ago, Jeroen made a similar point. He said we should favor incentives that create change, improvement, and innovation. I think moving to production but then taking it and revdeps down after a 3-week or 4-week delay makes sense for this.

Also, option B is so much simpler to understand and implement because it does not rely on individual package versions. I feel like there are fewer unknown unknowns with option B.

In addition, https://github.com/r-universe-org/help/issues/149 may help mitigate the tradeoffs of choosing option B.

shikokuchuo commented 3 weeks ago

"fewer unknown unknowns" I'd say that's incredibly important as a governing principle!

I guess the decision here affects how users perceive the Community and Production repos, and the choice of one versus the other.

Community always contains the latest release made by the maintainer, irrespective of revdep checks. Our decision here doesn't affect this.

Option B basically tries to replicate this in Production immediately and takes everything down if this is non-compliance in [3-4 weeks] time.

Alternatively, Production just doesn't upgrade until such a time everything complies (like CRAN does now).

You're right option B is cleaner. But if we take a snapshot at any one time of the Production repo, not everything will pass revdep checks. And this doesn't provide the iron-glad guarantees for the user - they can install a package which happens to be broken from Production, unless I've misunderstood something.

wlandau commented 3 weeks ago

But if we take a snapshot at any one time of the Production repo, not everything will pass revdep checks.

I don't think option A (sticking with the old version) would entirely prevent this. Suppose package Y depends on package X 1.0.0. If X 2.0.0 breaks Y, then revdep checks will still fail regardless of A vs B.

And this doesn't provide the iron-glad guarantees for the user - they can install a package which happens to be broken from Production

Whether we go with A or B, we still have a 3-4-week delay before the package gets kicked out of production. That itself means broken packages will be up there some of the time. Except for the nominal length of the delay, it's no different with CRAN.

wlandau commented 3 weeks ago

Suppose package Y depends on package X 1.0.0. If X 2.0.0 breaks Y, then revdep checks will still fail regardless of A vs B.

Instead of "then revdep checks will still fail regardless of A vs B", I should have said "there still exist probable enough failure modes in each of A and B." It just depends on the version of Y that X 2.0.0 breaks. I would prefer to use the latest Y for this because of the incentives.

shikokuchuo commented 3 weeks ago

It just depends on the version of Y that X 2.0.0 breaks.

In (A), the versions of X and Y in Production are compatible, so neither are broken as version X 1.0.0 remains in that repo. Sure both get taken down after the grace period, but I thought the Production repo is meant to provide the strongest guarantees to the user i.e. installing broken packages should not be an option.

Except for the nominal length of the delay, it's no different with CRAN.

CRAN implements a mixture really. It does the revdep checks first before the new version is allowed to enter CRAN (A), and if this is manually overidden by a CRAN admin, then we get the possibility that a new build may break downstream packages, which is then (B).

I would prefer to use the latest Y for this because of the incentives.

Would you mind repeating the logic here? I am seeing that in both A and B, the package gets removed from production after a grace period, providing the incentive to correct the error.

shikokuchuo commented 2 weeks ago

Just to be clear, at the moment I'm not in favour of one way or another or possibly a third way. I'm just trying to state the facts as I understand them.

shikokuchuo commented 2 weeks ago

I think that technically (A) would be feasible either by pinning the tags at Production, or by taking snapshots (of the affected packages?) whenever an issue occurs.

I should think that it's cleaner to use pinning. The infrastructure would then generate the latest 'passing' tag for each package in Community, and use this to write the packages.json for Production.

wlandau commented 2 weeks ago

Would you mind repeating the logic here? I am seeing that in both A and B, the package gets removed from production after a grace period, providing the incentive to correct the error.

Suppose CMTY and PRD can host different versions of the same packages, even on a temporary basis like in option (A).

Our current plan is to make decisions solely based on the checks in CMTY. Thus, maintainers are incentivized to fix problems in CMTY, but ignore what ultimately happens in PRD. That's fine if PRD is a true subset of CMTY. But if there is any possibility that CMTY and PRD host different versions of dependencies, then packages that pass in CMTY could silently and mysteriously break in PRD. And even if a package breaks in PRD, it will stay in PRD because it still works in CMTY. And because it stays in PRD, the maintainer sees no reason to take action.

So, if there is any possibility that CMTY and PRD host different versions, we need to use the checks from the PRD universe somehow.

Can we make decisions based on just PRD and ignore the checks in CMTY? Probably not. We are relying on R-universe to run the checks, and R-universe always runs those checks within the universe where the package is actually hosted. We are not equipped to check the package as if it were in PRD before we send it there. We can only check in CMTY and then move to PRD afterwards. Repurposing R-universe check infrastructure to accomplish this would be another big and niche request to Jeroen.

In this scenario where PRD is not a subset of CMTY (because of versions), I think the best we can hope to do is:

Observe that checks pass on CMTY.
Accept the package into PRD.
Confirm/hope that tests still pass in PRD after (2).

But this creates the same frustrating loop that you recently experienced while trying to keep nanonext published: R-hub/WinBuilder tests passed, then the package was accepted, then CRAN ran checks in production which are extremely difficult, and then nanonext got flagged for new problems you could not have prevented.

In fact, if we use checks from both CMTY and PRD, the incentives could be mutually exclusive. Suppose package Y depends on package X, version 2.0.0 of X is in CMTY, version 1.0.0 of X is in PRD, and it is impossible for Y to be compatible with both versions of X simultaneously. This situation happens all the time, and it causes maintainers to just give up and suppress all their tests (certainly not the incentive we want to create).

wlandau commented 2 weeks ago

So, if PRD which is not a subset of CMTY, I think we would have to either ask for huge changes in R-universe, or drop R-universe in favor of completely custom infrastructure. I am not in favor of either option.

wlandau commented 2 weeks ago

Now, let's consider option (B). In that option, we can trust that for each package in PRD,

Its version in CMTY is the same.
Its entire dependency graph in CMTY is the same (because of https://github.com/r-multiverse/multiverse.internals/pull/26)

Thus, it is enough to rely entirely on the checks from the CMTY universe only. Check results are guaranteed to be exactly the same between CMTY and PRD (except for intermittent non-reproducible failures that don't matter anyway). Package maintainers are incentivized to fix one and only one set of checks, which is completely achievable from their point of view.

Of course, the downside of (B) is that a careless maintainer has the power to temporarily put a botched version of an existing package in PRD. I would argue this is not as bad as it seems. If I understood @llrs correctly when the three of us last spoke, Bioconductor already operates under this model. There is an initial review process that establishes trust, and then that trust is extended to future versions of the package. In our case, passing CMTY grants PRD privileges, and failing CMTY revokes PRD privileges. That's how I would look at it. This seems much more desirable and much more transparent than how things work today (accept the package initially, then reject it for surprising reasons you could not have prevented).

shikokuchuo commented 2 weeks ago

So, if there is any possibility that CMTY and PRD host different versions, we need to use the checks from the PRD universe somehow.

Yes I'm with you to here. There is the possibility of divergence if checks only happen in CMTY.

wlandau commented 2 weeks ago

I'm with you to here.

I'll try to be a bit clearer. To think through option (A), it may help to think in terms of "promotion" and "demotion":

Promotion

Let's say "promotion" is when a new version of a community package goes to production. I guess I can summarize https://github.com/r-multiverse/help/issues/57#issuecomment-2161025664 by saying this: for promotion in option (A) to work properly, we would need to check each community release against the dependencies from production. This might be feasible from the R-universe side, but we would need to ask Jeroen.

Demotion

Let's call "demotion" the process of removing a package from production if its checks persistently fail. For this, we would need to run record_issues() on the production repo as well as the QA one. This is actually doable. We would need a clone of https://github.com/r-multiverse/checks for production, but that's straightforward. (We may even want to combine https://github.com/r-multiverse/checks and https://github.com/r-multiverse/multiverse back together for community and take a similar approach for production.)

And coming back to https://github.com/r-multiverse/help/issues/57#issuecomment-2127770925:

my first thought was to keep the old version in production indefinitely even if the new QA community version has problems.

If checks always use dependencies from the production universe, this actually becomes feasible. No need to interfere with a locked production version if the community version fails.

wlandau commented 2 weeks ago

To summarize: it all depends on whether community packages can be checked using production dependencies. If this is possible, we could end up with something like (A), but better.

wlandau commented 2 weeks ago

Our discussion has reached a point where we can amend our definitions of options A and B:

Option A

The production universe tracks specific versions of packages using branch: "SOME_GITHUB_SHA" in packages.json. Both community-level checks and production-level checks are important. Both sets of checks should agree because they both use dependencies from production. If a new package or new version passes in community, the fixed version is promoted. Failure at the community level blocks promotion for both new packages and new versions. Demotion is based entirely on production-level checks.

Option B

The production universe uses branch: "*release" in packages.json just like the community universe. Only community-level checks are used, on the grounds that production is a true subset of community. If the package is not already in production and community-level checks fail, then promotion is blocked. If an existing version is already in production, then new versions are always promoted. Demotion uses community-level checks.

wlandau commented 2 weeks ago

What about reverse dependency checks? For the revised Option B, https://github.com/r-universe-org/help/issues/369 is equivalent to revdep checks, and there is nothing more to do on the R-multiverse side. For the revised Option A, we have at least these obstacles:

To keep community and production from diverging, we would need an additional request to R-universe: rerun a package's community checks whenever a production dependency in updates. This may or may not be feasible from the R-universe side.
Can we run a reverse dependency check during the promotion phase, just like CRAN does? Maybe, but we would need to run that check ourselves, which is a lot to ask from an R-multiverse standpoint. Ideally we would want to rely completely on R-universe-side checks.
Should we run a reverse dependency check during the promotion phase? This may not be necessary or desirable. Dropping revdep checks during promotion might create the incentives we want (upstream packages should be free to improve, downstream packages should modernize and adjust).

shikokuchuo commented 2 weeks ago

I propose something a little different. Just to explain where I'm coming from: I see the biggest risk being the user experience. We are going to be maintainer-friendly however we structure the rules, even if we try on purpose not to be so!

But if, as a user, downloading packages from our shiny new Production repo (and everything we claim that to be) we kind of expect it to be 'production'. It may be OK, if one time by chance some hiccup occurs and we download a cohort of packages that are somehow broken (think something like install.packages("devtools")). But if this should happen twice or three times, then I think we've lost all credibility. As a user I'm not going to know (or want to know) how the system operates, or how this impacts maintainer incentives etc. I just know that Production guarantees everything to work, and Community to have additional packages if I can't find them in Production.

Option C

The production universe tracks specific versions of packages using branch: "SOME_GITHUB_SHA" in packages.json.

Checks are performed only on the Community repository.

A new package or new package version enters into Community to be checked. If it passes, it remains in Community and the fixed SHA is promoted to Production. If it fails, it is removed from Community immediately, and Community rolls back to the fixed SHA from Production.

We would need to create a third ‘staging’ repo to then track the branch: release of these packages with issues - but this can be kept as an internal implementation detail. Then when the next version is released, this again enters Community for checking.

This mechanism ensures that Community and Production versions are aligned, and more importantly that you can’t download a broken package from Production.

Revdeps checks:

If the new version in Community and Production causes revdep check issues in Community, then take down the affected packages from Production immediately.

Community moves to the fixed SHA for these packages and they are added for monitoring in the staging repo.

Welcome your thoughts / modifications on this.

wlandau commented 2 weeks ago

To keep community and production from diverging, we would need an additional request to R-universe: rerun a package's community checks whenever a production dependency in updates. This may or may not be feasible from the R-universe side.

I actually don’t think we need this because https://github.com/r-universe-org/help/issues/369 is really about demotion. By contrast, CRAN-like revdep checks are all about promotion. When a new version reaches community, it is checked against the deps in production, and then the promotion decision is simply made.

wlandau commented 2 weeks ago

Option C seems like an interesting way to check candidates before they go to production, and to make sure production is a subset of community most of the time. But a few things give me pause:

We would need to create a third ‘staging’ repo to then track the branch: release

Each universe will have a lot of packages when scaled up, and that places demand on R-universe. Three universes are a lot to ask.

If it fails, it is removed from Community immediately, and Community rolls back to the fixed SHA from Production.

What if you release two packages together and the downstream one is rejected because of a timing issue? Or, what if the failure is spurious and you need to try again? Would the maintainer need to create another release version to trigger this, or does community keep pulling from staging until it works? The former seems like a lot of manual work for the maintainer, and in the latter case, the community package could loop between staging and production versions indefinitely, which seems like a lot of churn.

This mechanism ensures that Community and Production versions are aligned, and more importantly that you can’t download a broken package from Production.

What if a previously healthy version starts to break production for reasons unrelated to the package itself? E.g. updates to dependencies, the version of R, the GitHub Actions runner OS, etc.

I see the biggest risk being the user experience.

Unless I am missing something about option C, I think the revised option A from https://github.com/r-multiverse/help/issues/57#issuecomment-2161401793 achieves this equally well (as long as community checks can happen with production dependencies). In the latter, new versions are checked against production dependencies before being promoted to production, which ensures the same quality from a user perspective. This makes me prefer A at this point because it needs fewer universes.

shikokuchuo commented 2 weeks ago

Each universe will have a lot of packages when scaled up, and that places demand on R-universe. Three universes are a lot to ask.

The third repo is small as it just tracks the packages with issues. There may be a clever way to not need one, but having a 3rd repo seems to fit the bill.

What if you release two packages together and the downstream one is rejected because of a timing issue? Or, what if the failure is spurious and you need to try again?

It's on the maintainer to correct things (which he can do immediately if he chooses). It's lowest risk as if the package is rejected, Community and Production still retain the previous version. A new version is only accepted when checks pass.

Would the maintainer need to create another release version to trigger this, or does community keep pulling from staging until it works?

Create a new release, which gets picked up by Staging. Then when Community sees the new version in Staging, pulls it in for checking.

What if a previously healthy version starts to break production for reasons unrelated to the package itself? E.g. updates to dependencies, the version of R, the GitHub Actions runner OS, etc.

The check would fail in Community. IIRC they are being re-run periodically. Then it gets taken down from Production and placed on watch in staging.

I see the biggest risk being the user experience.

Unless I am missing something about option C, I think the revised option A from #57 (comment) achieves this equally well (as long as community checks can happen with production dependencies). In the latter, new versions are checked against production dependencies before being promoted to production, which ensures the same quality from a user perspective. This makes me prefer A at this point.

You identified at least 3 obstacles to A in https://github.com/r-multiverse/help/issues/57#issuecomment-2161419398. This option C is an attempt to overcome them. We still need to ask features from R-Universe to make option A work it seems?

shikokuchuo commented 2 weeks ago

The third repo is just an implementation detail for want of a better solution - it doesn't even have to be public and no one will be downloading from there. We just need a way to (i) track latest release, (ii) pin the existing working release at Community and Production, (iii) when the latest release SHA changes (i.e. maintainer has made a new release), push it back to Community (or Community pulls it in).

shikokuchuo commented 2 weeks ago

Just to make sure I've addressed your questions:

Unless I am missing something about option C, I think the revised option A from https://github.com/r-multiverse/help/issues/57#issuecomment-2161401793 achieves this equally well (as long as community checks can happen with production dependencies)

Community checks with Production dependencies would be an ask of R-Universe, perhaps difficult right?

In the latter, new versions are checked against production dependencies before being promoted to production, which ensures the same quality from a user perspective.

If not promoted, this new version that fails checks remains in Community and causes divergence.

Option C is just my attempt at making option A work, and also tries to incorporate the most appealing feature of option B, which is that there is no divergence between the 2 repos.

llrs commented 2 weeks ago

I was tagged in relation to Bioconductor. Bioconductor trust upon first review, then maintainers can vandalize the packages (as long as they pass a minimal check), so old packages do not need to comply with newer requirements. I wouldn't recommend to take this path, as it has lead to "vignettes" being only a link to a book that it is out of sync with the package (Even if Bioconductor allows to publish books) and other issues.

I think you are reaching now the key issues that CRAN handled so far very well. I am not sure I followed the whole discussion well but some thoughts related to the points discussed:

To track if a Community package promotes to Production a simple file in a repository could be used. I don't see the need of a "full" r-universe R repository.
I think so far the assumption have been made that package maintainers are on top of things and following "the events of the multiverse" (sounds like a film). But that won't be the case in the long term. Unless all decisions are made automatically notifying maintainers will be required at some point.
I thought the Production repository was all about doing the reverse dependencies and only those that pass the checks are included there. If a machine has all the packages installed it should be relatively easy to check if they work with the new package (I'm thinking in symlinks and a specific library for each reverse dependency check or using the tools::check_packages_in_dir).
I am not sure how solving the discrepancy when a package pass checks for Community but not for Production will be resolved. In Bioconductor, discrepancy between Production and Community is allowed and even encouraged: new features shouldn't come to the production repository, aka. the released version, they should only appear on devel so that maintainers can adapt. In CRAN, there shouldn't be, at the cost of accepting for a window of time, 2 weeks, broken packages.

I think Gabe Becker presented these problems more organized in a meeting a year ago in the R repository working group (I am not sure if it was recorded or written down).

On a related note, I'm not sure if you are aware that CRAN now displays the deadline they set to a maintainer to fix a package. So now this is public and other maintainers can start preparing themselves if something goes amiss. See for example this post from Dirk Eddelbuettel. I think with time this will help to measure in how much of a discrepancy state is CRAN, but also maintainers and users.

shikokuchuo commented 2 weeks ago

Thanks @llrs. I think the following points are great takewaways:

It's important not to lend too much 'trust' and let things get stale in Production
Notifying maintainers is going to be important (the role of the RSS feed etc.), and frequency of moving in and out of Production

I think it's also important to note that we're not still having the same discussions as the RCWG (although it may look like it). I think either of the existing plans already deliver at least a CRAN-like level of service to users. We're really looking to see how far we can push things to go above and beyond.

For example CRAN allows a 2 week grace period, extended by another 2 weeks if there are strong-dependencies. During this period, any user can download 'broken' packages. Similarly, only a couple of checks are run when an update is accepted into CRAN, and the rest run afterwards. We're playing with whether we can improve on this - so although it may sound alarming when we say 'broken' packages above, we're really already within these parameters and it is not worse than this.

The option C I proposed above has the benefit that Production remains stable 100% of the time, but perhaps this is too ambitious. I think we can get close to it with a modified version of option B. I'll post something later on.

shikokuchuo commented 2 weeks ago

@wlandau to summarize where we now are, I think C is the logical conclusion of A so that package versions between Community and Production remain aligned 100% of the time.

I am also onboard with your comment here: https://github.com/r-multiverse/help/issues/57#issuecomment-2161538055

So if we focus on B and C:

Option B

Package maintainers are incentivised to fix problems are they affect Production (+)
Grace period allows packages not passing checks to enter into and stay in Production, but not beyond current CRAN practice (-)

Option C

Packages in Production are 100% safe (+)
Packages move in and out of Production with no notice when checks fail (-)

So Option C is user-friendly, but probably impossible actually for maintainers (surprising result!).

Option B seems a good compromise in comparison, and all else being equal is preferred due to its simplicity.

wlandau commented 2 weeks ago

Community checks with Production dependencies would be an ask of R-Universe, perhaps difficult right?

Yes, and potentially difficult. Haven't asked Jeroen yet what he thinks.

wlandau commented 2 weeks ago

I think C is the logical conclusion of A so that package versions between Community and Production remain aligned 100% of the time.

On reflection, we don't necessarily need versions to align. We just need the promotion checks in community to be identical to the demotion checks in production. Otherwise, maintainers will get blindsided by production-level failures they cannot reproduce. That's what this is really all about. Version alignment is one way to reach check alignment, e.g. with Option B, but it's not the only way. Another way is to make both sets of checks use production dependencies. That's why I prefer A over C.

wlandau commented 2 weeks ago

On reflection, B is no longer my favorite. Even the most diligent maintainer can make mistakes that cause checks to fail. If I am a maintainer and I make a mistake, I don't want that mistake to automatically go to production. It's extra pressure to get the release right on the first attempt.

wlandau commented 2 weeks ago

To summarize: if we go with Option A, my questions are:

Can we check community packages with production dependencies? I will ask Jeroen.
Do we need reverse dependency checks?
Do we need a 3-4-week grace period when a package breaks in production?

For (3), it could be enough that the package stays available in community. After all, that's exactly what the community universe is for: releases that could be broken. On the other hand, we might want the ability to account for intermittent test failures that happen through no fault of the package.

For (2), revdep checks have already been proposed for R-universe (https://github.com/r-universe-org/help/issues/362, https://github.com/r-universe-org/help/issues/364), and we might consider running a revdep check on a new community version before promoting it to production. Advantages:

Prevent breaking changes to downstream packages.
Ensure the same guarantees as CRAN.

Disadvantages:

Revdep checks can cause deadlock. If package X 2.0 breaks package Y 1.0, then it is impossible to promote X 2.0 and Y 2.0 together. This is exactly the sort of impossible situation that de-incentivizes testing in packages. Without revdep checks, you can promote X and then afterwards promote Y.
Revdep checks are computationally demanding and might only become available on a manual on-demand basis in R-universe.
https://github.com/r-universe-org/help/issues/369 is already a less heavy-handed way to catch revdep issues early, albeit in production.

shikokuchuo commented 1 week ago

I've been thinking about a solution that works like this:

Community runs checks on Community packages
Production runs checks on Production packages

Production is just a subset of Community (same package versions at all times). Hence a new package that is pushed to Community gets reflected in Production straight away.

Only checks in Production matter. If a package in Production starts failing checks in Production - due to an update to itself, or a dependency - it gets put on notice.

If on notice from Production:

Remove in 2 weeks if no revdeps
Remove in 3 weeks if revdeps (revdeps get notified after 1 week)

The notice period is cancelled as soon as a new version passes checks in Production.

Community checks would be for information, but as we allow all packages in the repo, it leads to no action.

I'm very keen to keep the mechanism as simple as possible for it to gain easy acceptance by the community.

wlandau commented 1 week ago

Your idea keeps the simplicity of Option B while actually using checks from production. It does keep things simple, and unlike B, it does not make assumptions about how checks translate from one universe to another. That part is appealing. However, I don't think it will work because checking in production may manual work.

Consider the isoband incident from last year, when thousands of packages were on the brink of removal. Suppose isoband breaks in R-multiverse, takes down all its revdeps, and then gets fixed. How do we get those revdeps back in production? https://github.com/r-multiverse/help/issues/57#issuecomment-2176039354 has no way to promote packages based on what happens in community, so every maintainer of every revdep would need to create a new release.

Even if there is a special exception which helps the ecosystem bounce back, there's still the fact that the only way to run checks in production is to already be there in the first place. To me, that makes those checks less accessible than the other options.

wlandau commented 1 week ago

The more I think about it, the more I like revised Option A from https://github.com/r-multiverse/help/issues/57#issuecomment-2161401793. That's the one where we use checks from both community and production, but make them both use production dependencies. It has compelling advantages:

It's the most CRAN-like out of all the options we have considered so far, and that's a good thing. CRAN is a model that the R community already understands, and it is generally good at keeping broken packages out of production. I think borrowing the best parts of a tried and true model will help R-multiverse keep the ecosystem healthy, gain acceptance, and avoid criticism.
An isoband incident would fix itself automatically. After the patched version of isoband reaches production, the community checks on the revdeps will start passing again, and those thousands of packages would migrate back to production automatically.
As a maintainer, I want some kind of filter to keep my mistakes out of production. I am never sure if a package will work until the important checks run (or at least the approximation that Option A provides). If I create a broken GitHub release and then head out for a holiday, it will be hard to rest knowing that I just broke one of my packages in production and can't fix it until I come back. In other words, I can't be confident in releasing a package unless I give myself time to debug and re-release to production. That does not seem like a very good maintainer experience.

llrs commented 1 week ago

Here is a quick visualization, to check if I understood correctly the different proposals to include packages. In blue the repositories of r-multiverse. Charlie's last proposal:

flowchart LR

repo[maintainer repository] --> |initial setup| A
A(Community) -->|automatic| B(Production)
B --> C{checks}
C -->|pass| F[Remains on Production]
C -->|checks fail with revdeps| D[Removed from Production in 3 weeks]
C -->|checks fail with no revdeps| E[Removed from Production in 2 weeks]

style repo fill:#34eb4f
style A fill:#34d5eb
style B fill:#34d5eb
style C fill:#eba834

Will's last proposal:

flowchart LR

repo[maintainer repository] --> |initial setup| A
A(Community) --> C{checks in production}
C -->|pass| B[Production]
C -->|checks fail| D[not included in Production]

style repo fill:#34eb4f
style A fill:#34d5eb
style B fill:#34d5eb
style C fill:#eba834

What I still don't understand is which checks are performed in each case:

For Charlie's approach will they be performed in several R versions, or only on released branch? How will R or other (system) dependencies be managed?
For Will's approach in addition to the previous questions, I am missing when packages on that scenario leave Production (besides when being replaced by newer versions). Will checks be on inclusion and on schedule (each day/week, ...) ?

The second approach is easier to understand as it is more similar to CRAN and Bioconductor inclusion criteria.

shikokuchuo commented 1 week ago

Right let's move to convergence.

The key insight from the system I proposed is that only Production checks matter. I've confirmed for Option A that this is also the case. The checks between Community and Production (= new version as if in Production) always agree, except for when a dependency has different versions between Community and Production. As soon as a passing version of that dependency is uploaded and promoted then the check discrepancy will sort itself out. But no matter what the Community checks are, only the Production ones matter for promotion.
I agree with fixing the SHA for Production as proposed in Option A (think this is good practice). This allows temporary divergence of versions, but keeps bad builds out of Production and helps with stability of the overall system.
It then becomes clear that Production checks for strong dependencies should be run whenever a new package is promoted - this is https://github.com/r-universe-org/help/issues/369#issue-2164948647 This is the answer to @llrs question above.
This system has the advantage that packages with no dependencies passing checks automatically promote, forcing dependencies to update. This is a cleaner and more transparent practice than the current manual intervention / decision for CRAN.
Open question: how to handle when checks fail in Production. The package should be put on notice for removal from Production. This necessitates also notifying strong depends. We could follow the CRAN 2 weeks / 2 weeks policy or something different. But structurally there should just be a notice period, otherwise packages will move in and out too frequently.

This system requires running a check on a package in Community as if in Production. This is the same as actually putting it in Production, run the checks, and then roll back immediately if it fails. I think this then makes it technically feasible (in case it is not possible for R-universe to implement this feature) and the time window is short enough not to likely cause issues for users (and they can always download again).

I also do not see a role for the CRAN-style revdeps checks if the above is put into place.

Anything important I've missed out?

wlandau commented 1 week ago

Thank you both for pondering this. @shikokuchuo, I am glad we are aligning.

The key insight from the system I proposed is that only Production checks matter. I've confirmed for Option A that this is also the case.

Yes, exactly. Production checks are the gold standard.

The checks between Community and Production (= new version as if in Production) always agree, except for when a dependency has different versions between Community and Production.

Yes, and the potential for that discrepancy was the major flaw in the first rendition of Option A.

This system requires running a check on a package in Community as if in Production.

100%. Revised Option A has two kinds of checks:

Entry into production: checking Community packages with Production dependencies.
Staying in production: checking Production packages with Production dependencies.

For (2), we use ordinary production checks. For (1), we use community checks on the grounds that they will agree with production.

This is the same as actually putting it in Production, run the checks, and then roll back immediately if it fails. I think this then makes it technically feasible (in case it is not possible for R-universe to implement this feature) and the time window is short enough not to likely cause issues for users (and they can always download again).

Yes, if R-universe can't use production dependencies for community checks, then this workaround becomes the next best option.

Open question: how to handle when checks fail in Production. The package should be put on notice for removal from Production...We could follow the CRAN 2 weeks / 2 weeks policy or something different. But structurally there should just be a notice period, otherwise packages will move in and out too frequently.

Yes, some number of weeks seems ideal. Except possibly in the above case above when immediate fallback is appropriate.

This necessitates also notifying strong depends.

The hard part is already done in https://github.com/r-multiverse/multiverse.internals/pull/26.

wlandau commented 1 week ago

@jeroen, thanks for your willingness to use production (+ CRAN) dependencies for community-level checks. With that on the agenda, I think are good to go with revised Option A as described above. I will begin implementing this soon, hopefully next week.

wlandau commented 1 week ago

I have some prework at https://github.com/r-multiverse/community/pull/10.

shikokuchuo commented 1 week ago

@wlandau you may already have an idea in mind, but I thought we could tackle the Production universe implementation in broadly 3 steps:

GitHub action to write the fixed SHA of the Community repo packages (to populate the Production universe)
Implement the logic in terms of check results used as precondition for the above
Implement the logic for checks triggered when a package gets promoted to Production (for recursive revdeps) and actions to put failing packages on notice for removal from Production

In terms of actual checks to use, I am in favour of using R-release and R-oldrel (which fortuitously matches with R-universe's current build system), and not use R-devel (until perhaps [2] months from the next annual release date for a transition period). The current debacle with constant changing of the API including forcing errors on packages is neither desirable nor necessary. Production by definition should not be running pre-release software.

wlandau commented 1 week ago

Yeah, I think that makes sense in terms of 3 layers for the implementation of a new multiverse.internals::write_production() function (or whatever we name it). And I agree to use R-release, and if available, R-oldrel (avoiding R-devel )

FYI, to update this thread: @jeroen mentioned it might not be feasible to use production dependencies for community checks after all. This is because R-universe combines the "check" step with the "build" step. So if we want install.packages("tidypolars", repos = "https://community.r-multiverse.org") to work, then polars needs to be in production, which it's not. So we're thinking about a third "Staging" universe, where packages are checked with production dependencies, but where the builds are not necessarily available to users.

wlandau commented 1 week ago

An alternative is to add extra checks to Community and have those checks use production deps.

wlandau commented 6 days ago

In an offline discussion, @jeroen said it may be possible to filter based on checks at the R-universe level: if a check fails in the production universe, do not publish the build. I would like to know if the following details around this are also feasible:

Beforehand, we make a list of "important" checks for the purpose of filtering. For the purposes of getting started, I it seems reasonable to include all R-release and R-oldrel checks that R-universe provides. But creating a good list of checks is highly nontrivial, and a serious discussion belongs in a separate thread like https://github.com/r-multiverse/help/issues/58.
For the production universe only: if an "important" check fails, then no build is published for that round of checks. For example, if package X version 2.0 fails on Windows but succeeds on Linux, then the Linux build is still not published.
https://production.r-multiverse.org/api/packages shows the failed results from X 2.0.
The most recently successful build of X remains installable. For example, if X version 1.0 passed on all platforms, then install.packages("X", repos = "https://production.r-multiverse.org") installs version 1.0, and https://production.r-multiverse.org/src/contrib/PACKAGES.json lists version 1.0.
None of this applies to the community universe. In Community, every existing build gets published, regardless of checks.

If all the above is feasible, this opens new possibilities for R-multiverse.

wlandau commented 6 days ago

Given the above, I would like to propose a new model for production:

Option D

Universes

Community: every supported build of the latest release is always published, regardless of checks. The R-universe packages.json has "branch": "*release" so that each new GitHub/GitLab release automatically gets published in the Community universe.
Production: a build is only published if checks succeed, as described in https://github.com/r-multiverse/help/issues/57#issuecomment-2188995766. The R-universe packages.json has "branch": "SPECIFIC_GITHUB_SHA", where SPECIFIC_GITHUB_SHA is the GitHub SHA of a specific release version.

Check types

multiverse.internals can get results from any of the following:

R CMD check checks: R CMD check results from the R-universe API.
Version checks: check that the version number monotonically increases with each release.
Description checks: check issues discoverable from the repo's src/contrib/PACKAGES.json (basically available.packages() metadata). For example, packages in production should not have Remotes:.

Promotion

Promotion is the process of putting a specific release version of a package in production. In Option D, the process is happens as follows:

Check the version checks and description checks in Community. If they pass, move to the next step. If either fails, the package version is withheld from production.
Include the new package version in the Production packages.json. If the important R CMD check checks succeed, then R-universe published the build to https://production.r-multiverse.org. If any important check fails, then the new build is not published to any platform in Production, but the most recent existing build is still available from install.pacakges() and src/contrib/PACKAGES.json.

Demotion

Demotion is the process of removing a build from Production if a check starts failing. This is accomplished with a "reset" mechanism (step 3 below).

Consider the "important" R CMD checks, version checks, and description checks that happen in Production. If any of these checks fail, put the package on notice for 2-4 weeks.
If the package is fixed on time, then the notice is removed, and any new notice starts from day 0.
If the package is not fixed on time, then we remove the build from the universe so cannot be installed with install.packages(repos = "https://production.r-multiverse.org"). This protects users from broken packages. Removal of the build is accomplished with a "reset" mechanism: the JSON entry is removed and blocked from Production's packages.json until all builds disappear from src/contrib/PACKAGES.json. After the build is gone, then it can be unblocked so the automation for promotion can add it back whenever appropriate. This will help Production bounce back from an "isoband" incident, etc.

wlandau commented 6 days ago

EDIT: I don't think Option E (below) will work

A much simpler version of Option D (let's call it Option E) is to

Completely remove the notice period.
Do not keep the old build available in Production if checks fail.
By extension, dispense with the complicated "reset" mechanism for demotion.

This makes the production model a whole lot simpler to maintain and explain. And it may not actually be that bad for maintainers. Suppose my package crew version 0.9.4 was previously succeeding in production, but then I release broken version 0.9.5. In Option E, crew immediately vanishes production, and install.packages() fails. If that happens, I can simply go to https://github.com/wlandau/crew/releases/edit/0.9.4 and set version 0.9.4 to be the latest release, then have all the time in the world to troubleshoot:

Capture

wlandau commented 6 days ago

On the other hand, downgrading doesn't always solve problems. If a reverse dependency is the cause of the trouble, it can take everything down with no notice. So Option E probably won't work.

shikokuchuo commented 6 days ago

Agree option E doesn't work - as a dependency can suddenly disappear at any time.

I see you've fleshed things out in option D, but fundamentally it seems to be the same as what we had agreed upon before? I can see that it technically incorporates the build/check step in Production, and only gets published if it passes. This is similar to the backup plan we had before - put in Production, check then roll back if it fails. Now we don't need to roll back, right?

wlandau commented 6 days ago

I see you've fleshed things out in option D, but fundamentally it seems to be the same as what we had agreed upon before?

Yes, just a couple implementation details are different.

I can see that it technically incorporates the build/check step in Production, and only gets published if it passes. This is similar to the backup plan we had before - put in Production, check then roll back if it fails. Now we don't need to roll back, right?

Exactly, we don't need a rollback because the "roll-forward" is filtered. The only downside is that now we need the extra reset mechanism for demotion, and that seems complicated: remove and block from packages.json, then wait for the build to disappear, then unblock. Willing to go with that, but I wish it were simpler.

shikokuchuo commented 6 days ago

So if I understand correctly, if a package v2 gets submitted and fails, v2 stays in Community and v1 stays in Production indefinitely? That means that there must be a record somewhere that v2 was checked so that it is not constantly re-checked for promotion. Then only when v3 is submitted, is the new version checked.

Conversely, if we have this, that means that if a package is actually taken down e.g. V2 was in production, fails as a dependency is no longer in Production, is removed from packages.json - it eventually disappears from Production. As V2 was recorded checked, it shouldn't somehow end up in Production again.

With the exception being - we can require that every time a package is promoted, all its dependencies in Community get rechecked, and promoted if necessary. This implements the auto-healing isoband situation.

wlandau commented 6 days ago

So if I understand correctly, if a package v2 gets submitted and fails, v2 stays in Community and v1 stays in Production indefinitely?

That's what I assumed from @jeroen's Slack comments yesterday. R-multiverse seems like a natural place to implement a notice/expiry period.

That means that there must be a record somewhere that v2 was checked so that it is not constantly re-checked for promotion. Then only when v3 is submitted, is the new version checked.

AFAIK re-checks on the same version don't happen very much in R-universe. Once a month, IIRC. Not much waste. And actually, for https://github.com/r-universe-org/help/issues/369, re-checks are part of the isoband auto-healing process.

Conversely, if we have this, that means that if a package is actually taken down e.g. V2 was in production, fails as a dependency is no longer in Production, is removed from packages.json - it eventually disappears from Production. As V2 was recorded checked, it shouldn't somehow end up in Production again.

With the exception being - we can require that every time a package is promoted, all its dependencies in Community get rechecked, and promoted if necessary. This implements the auto-healing isoband situation.

It seems easiest to just leave failing package versions in Production's packages.json, except for explicit demotion. That way, R-multiverse does not need to know if it's a failed attempt at promotion vs a likely demotion, and it makes the logic simpler (e.g. auto-healing).

r-multiverse / help

A formal policy to govern how packages enter and leave the production universe #57

Promotion

Demotion

Option A

Option B

Option C

Revdeps checks:

Option B

Option C

Option D

Universes

Check types

Promotion

Demotion