runtimeverification / k

K Framework Tools 7.0
BSD 3-Clause "New" or "Revised" License
452 stars 149 forks source link

K Monorepo Planning #4063

Open tothtamas28 opened 8 months ago

tothtamas28 commented 8 months ago

There is an opinion within the K Team that the well-known issues (#3871) regarding the release and integration process of K components would be much more manageable in a monorepo setup where

This issue is for discussing and planning the actual steps we need to take in that direction.

(Sketch follows, to be extended and organized.)

Repo layout

k
├── booster-backend
├── haskell-backend
├── k-frontend
├── llvm-backend
└── pyk

Project dependencies

k-frontend ---+---  llvm-backend   ---+--- booster-backend --- pyk
               \                     /
                +- haskell-backend -+

Integration steps

Integration should happen one repo at a time.

Testing

When making a change in one subproject, the test suite of all dependent subprojects must be run. This makes releases slower but less likely to cause issues downstream.

Since pyk is a dependent of all other subprojects, it can serve as a universal test harness.

Issues

ehildenb commented 8 months ago

I disagree pretty strongly with making things a monorepo for several reasons:

Either way, if we do it, we should not invert the pyk <- K dependency. Pyk is the new frontend for all intents and purposes, and K is moving towards becoming just parser + kompile, which is one tool for managing Kast and pretty formats.

We already have the benefit of "monorepo", by having a single repository that all updates go through, the pyk repo. We can have better discipline (or even enforcement) about making it so that when we have a change upstream (say to booster), that we have branches (referenced in the PR, and tested there in downstream repo) in all downstream repos that need a change before it gets merged. That doesn't make a monorepo, instead makes a separate k-testing repo that runs a battery of integration tests against all the specified branches that the PR pulls in (which in most cases, will be a bunch of master branches).

ehildenb commented 8 months ago

Basically, I have yet to see an argument for monorepo that actually solves the "my change breaks things elsewhere" problem, while not increasing CI burden way too much.

tothtamas28 commented 8 months ago

It will increase CI times pretty dramatically. A change to the LLVM backend, which does not touch anything on Haskell, will run all of Haskell CI.

Workflows should be set up so that they are triggered only on changes to certain files: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore

So since haskell-backend is not a dependency of llvm-backend, those tests should not run. On the other hand, tests in booster-backend should run, since llvm-backend depends on it. One advantage of the mono-repo setup is that this testing is automatic.

And if everything is in the same repo, subprojects can start to migrate their tests to the pyk test harness. For example, instead of storing definition.kore + request.json files that are impossible to comprehend and update, the symbolic backends could use the Kompiler and KoreClientTest classes to make requests to the server.

ehildenb commented 8 months ago

I still don't see the advantage. If we want to migrate towards a pyk-based testing harness for everything, then we can simplify have a separate pyk-based testing repository with integration tests, and have each PR on the separate repos call into that one to get test results back. Then on the pull request (or committed to the repository), you can specify the branches of different repositories that must be used for your PR into this particular repository.

If we're limiting test runs based on changesets anyway, we are not solving the fundamental problem of unexpected breakages. The issue arises when a change to a upstream repo causes a breakage downstream to a seemingly unrelated component. Unless we run all the tests on every change, we cannot avoid this.

tothtamas28 commented 8 months ago

it's to be expected that massive changes to dependency chains in projects will cause us headaches, I don't think a monorepo will solve that

Where it helps is that changes become atomic, so for example the following scenario will not happen:

Instead:

The time it takes to propagate haskell-backend -> hs-backend-booster -> k -> pyk is significant, even if nothing comes up. From detecting the issue from PR1 to propagating PR4 downstreams it can take weeks and a lot of coordination between teams. In a monorepo, this coordination is driven by the tests that break, and puts the burden of handling the issue on the person that breaks the test, not the person that tries to integrate the new version two repos down the chain.

tothtamas28 commented 8 months ago

If we want to migrate towards a pyk-based testing harness for everything, then we can simplify have a separate pyk-based testing repository with integration tests, and have each PR on the separate repos call into that one to get test results back.

If this exists as an alternative, it should be planned and compared to the monorepo proposal before we make a decision.

tothtamas28 commented 8 months ago

If we're limiting test runs based on changesets anyway, we are not solving the fundamental problem of unexpected breakages.

The triggering changeset should be coarse enough, e.g. we know that llvm-backend depends on k-frontend, so any change in the k-frontend folder should trigger the test.

Obviously, there's a tradeoff between efficiency and thoroughness of testing, but that tradeoff exists regardless of the repo setup. Right now, we just let developers make the call, and it doesn't work well in practice.

ehildenb commented 8 months ago

The downstreaming time HB -> Kontrol doesn't take weeks anymore in the happy path, it takes days at most, and usually is same-day. In the unhappy path, I don't see it being any better, we still need to sort out depenedency issues. The unhappy path is also not solved by monorepo because KEVM will not eb included there, and neither will Kontrol.

Baltoli commented 8 months ago

After some thought, I think my primary motivations for wanting something like this are:

ehildenb commented 8 months ago

I'm all for a dedicated separate repo that is able to run KEVM/Kontrol tests, for example, and KMIR/KWasm, and that repo must be passing some smoke tests for upstream codebases to be able to merge. That would actually solve this problem. But a monorepo won't we'll still see the same breakage at KEVM level. Also, from KEVM perspective, things are atomic, since it only has dependency on pyk anyway, not directly on any other component.

Baltoli commented 8 months ago

I'm all for a dedicated separate repo that is able to run KEVM/Kontrol tests, for example, and KMIR/KWasm, and that repo must be passing some smoke tests for upstream codebases to be able to merge.

Yep, this is the conclusion I think I converge on as well

ehildenb commented 8 months ago

We can implement that separate smoke test repo with current setup without moving towards monorepo (and thus having to analyze changesets or increase CI pressure), and later move to monorepo if we still see an argument for it. They are orthogonal solutions that solve different problems. Still not convinced monorepo really solves any problems, but maybe if smoke test-suite doesn't work then it we'll see something else as argument for monorepo.

Already, the HB team has been making sure to run KEVM/Kontrol CI on their codebases and post performance numbers to the PR to ensure there isn't regression. That has dramatically improved the downstreaming story when other breakages aren't involved. So enforcing that and rolling it out to other repos I think is the next logical step.

Baltoli commented 8 months ago

Perhaps let's discuss tomorrow how we might start to build that smoke testing repo from a practical perspective. In an ideal world, there would be a button somewhere on GitHub that - when pressed and given an argument - assembles a K toolchain, runs downstream tests, then provides some sort of easily inspectable certificate in the logs that can be pasted into a PR.

ehildenb commented 8 months ago

I'll reference this issue: https://github.com/runtimeverification/k/issues/2869

ehildenb commented 8 months ago

I think the PR volume on a monorepo will be insanely high, and not only will increase CI build pressure, but will increase frustration as people's PRs will be delayed from being merged because some other PR gets merged first. Already this can happen in each individual repo. Every time one person's PR is merged, your branch needs to be updated, any CI time already spent on it is wasted, and then you restart CI only to potentially have it happen again. Then everyone is watching their PRs to hit the Update Branch button first, and it repeats.

Having separate codebases gives us reduced CI burden and faster lower-friction merge times, which is not to be discarded. I'm all for increasing our testing capabalities, but I don't tihnk monorepo is the solution.

ehildenb commented 8 months ago

The reality is that 80% (or more) of PRs do not have the downstream issues, as evidenced by the weeks of time that went by where downstream merges were going smoothly and quickly. We should not optimize for the less common case (when we have a huge breaking change that has to go through), though we can try and catch it better.

Another option is that when we have a huge breakage going through, we can block merging on other repos in the system, to get that change pushed through in isolation. That could be a pretty straightforward way to stop change pile-ups like we just had.

tothtamas28 commented 8 months ago

I'm all for a dedicated separate repo that is able to run KEVM/Kontrol tests, for example, and KMIR/KWasm, and that repo must be passing some smoke tests for upstream codebases to be able to merge.

K Monorepo vs semantics test repo are not mutually exclusive proposals since they address different system boundaries (between K components vs between K and semantics).

tothtamas28 commented 8 months ago

I think the PR volume on a monorepo will be insanely high, and not only will increase CI build pressure, but will increase frustration as people's PRs will be delayed from being merged because some other PR gets merged first.

That's a good point. I added a bullet point for this in the issue description.

ehildenb commented 8 months ago

We don't have issues with K-internal updates, they don't end up blocking merges to downstream projects where the updates are needed, and are usually resolved quickly in comparison. The main issue from user side we need to fix is not breaking KEVM/Kontrol with updates to K tools.

For that, I suggest a separate integration testing workflow, that allows each repo to specify the verions of downstream software it's expected to work with. That information can also be used to automatically pull in the downstream branches with the needed fixes as the update-deps job gets pushed through, while maintaining lightweight individual CI testing and merging.

We have looked into merge queues, and tried it out. It does not meet our needs, as it does not allow different batches of tests to run to enter the merge queue vs once you're in the merge queue, so it doesn't actually reduce CI burden. This has been known a while: https://github.com/orgs/community/discussions/46757?sort=new. Maybe things have changed since we last looked, but AFAICT they don't meet our needs.

ehildenb commented 8 months ago

Another option, is to auto-rollback updates which cause breakages. Basically automatically revert all software to the latest known working version (and automate this workflow) when there is a breaking change, with an automated revert PR to the relevant repos. This at least can be pretty automated, and then allows us to optimistically merge quickly, but also quickly get back to working state.

But once an automatic revert is applied, then we are once again having to just test the change with downstream software before merging again. So having automated way to test downstream software with current version upstream is needed anyway.

tothtamas28 commented 8 months ago

We don't have issues with K-internal updates, they don't end up blocking merges to downstream projects where the updates are needed, and are usually resolved quickly in comparison.

Here's a recent K update from pyk: runtimeverification/pyk#913. It took five days to merge. During that time it was not possible to ship any critical fixes to downstream projects. It also bumped K version 12 times, meaning that all those versions had to be integrated into downstream semantics in one step.

ehildenb commented 8 months ago

Monorepo without running all of pyks tests in addition doesn't fix that though, testing downstream will.

ehildenb commented 8 months ago

I would definitely be in favor of a change where all the dependent repos fed directly into pyk (instead of into K repo), then we can easily revert just one repos changes. That also would address that particular issue.

{LLVM, HB, HBBooster, K} -> pyk

Then if an update to one of them breaks pyk, the others still can update. But each repo can still maintain its own merge cadence.

ehildenb commented 8 months ago

Here is the list of updates from K -> pyk https://github.com/runtimeverification/pyk/pulls?q=is%3Apr+is%3Aclosed+Update+dependency%3A+deps%2Fk_release

Of those listed on the first page, all of them merged without incident except the one linked above. The same is true going to the next page, and back for quite a while.

Fixing this one issue, where we had a major changes to upstream projects dependencies (GHC upgrade on Haskell backend, glibc upgrade on LLVM backend), is not what we should prioritize for smoothness (this type of upgrade is not common). We should prioritize the common case being smooth and low CI burden, and that's exactly what the current setup does.

I'm all for adding extra checks to catch these issues earlier, but am very against increasing friction in the common case.

tothtamas28 commented 8 months ago

Here is the list of updates from K -> pyk https://github.com/runtimeverification/pyk/pulls?q=is%3Apr+is%3Aclosed+Update+dependency%3A+deps%2Fk_release

Of those listed on the first page, all of them merged without incident except the one linked above.

Here's another one from the first page (merged two weeks ago) that didn't: runtimeverification/pyk#860. Took 11 days to merge, bumped 23 versions.

The number of non-problematic PRs is not indicative of the problem, the number of days spent without an incident is. And just by looking at these two PRs, we can say that we've spent half of February without being able to make a stable K release.

We agree that this is an issue, and that the solution is to run tests from dependents (within the K ecosystem) before merging a PR in a dependency. In a monorepo, this is automatic. In the current situation, it is either manual, or requires additional moving parts in our devops setup, which have to be maintained, can be down, etc. Also, in a monorepo, you test the actual release, and not something that approximates it (since by the time you finish testing, some other subproject might make a release).

I would definitely be in favor of a change where all the dependent repos fed directly into pyk (instead of into K repo), then we can easily revert just one repos changes.

We should consider this alternative. But the dependency between other K components will be still there, and reverting the changes, coordinating the versions, etc. can still have a significant overhead.

tothtamas28 commented 8 months ago

I think the PR volume on a monorepo will be insanely high, and not only will increase CI build pressure, but will increase frustration as people's PRs will be delayed from being merged because some other PR gets merged first.

Whatever the setup, ensuring it is not a pain to work with will be absolutely necessary, so we should analyze this. Here are my thoughts.

Testing should not be considered increased CI build pressure

We can decide not to run tests in a monorepo. Conversely, we can establish processes in the current setup that require developers to run tests from other repos.

Moreover, we run a lot of tests in the multi-repo setup too. By the time a change from haskell-backend is merged in pyk, we run tests from haskell-backend, hs-backend-booster, k, pyk. We just run these tests sequentially.

Difficulty merging things

I'll assume a merge queue for the monorepo (this got discarded quickly, but I do not have a better alternative at this point). There are two kinds of changes.

Single-repo changes

In multi-repo setup, you can merge a pull request in the corresponding repo without a problem, as long as the change is up-to-date with master. W.r.t. commit frequency, this is the ideal case for the multi-repo setup.

In a monorepo, this corresponds to a change that's confined to a single subproject. Suppose there are no merge conflicts (i.e. the change is up-to-date with its own subproject folder), and branch protection checks are passing. You put the PR in the queue, the changes are merged with the last item of the queue, and tests are run again. There are two cases:

  1. The change goes through. Tests got run twice though, so compared to the many-repo setup, this delay is pure overhead (though after queuing up the PR it didn't require time spent by the dev).
  2. Some test fails. This is due to a semantic conflict, which in the many-repo setup you catch either 1) before merging, by somehow luckily testing against a recent enough version of some of the other projects 2) too late, resulting in reverts, hotfixes, congestion, etc.

Multi-repo changes

In a multi-repo setup, this requires multiple PRs, each ideally compatible with current versions of its dependents. Even simple changes require careful coordination (like renaming an internal attribute, see e.g. #3988). When looking at the change as a whole, delivering this is not simpler than making a single PR in the monorepo, and letting the merge queue test the change against the (prospective) most recent version.

ehildenb commented 8 months ago

Ok, if there aren't other voices, let's try it. automerge does what I think you're hoping merge-queue will do, which is to reduce CI build pressure to minimum while running all the needed tests, so we can just keep using that system.

I think we should just run all the tests on every PR, running only a subset of tests will not catch bugs like the one above, where a change in Haskell backend resulted in a test in pyk breaking, and the current setup will catch that.

We will have to have discipline among the team to avoid pressing teh "Update Branch" button without really good reason to avoid unnecessary build pressure. Build pressure due to testing PRs that aren't ready or aren't close to merge is unnecessary, and we already spend quite a bit of money on compute.

ehildenb commented 8 months ago

The increased test pressure here will come from a change to pyk codebase also running HB tests, for example. Maybe we can still specify that dependencies in that direction don't cause tests to re-run?

tothtamas28 commented 8 months ago

Ok, if there aren't other voices, let's try it.

Let's wait for other's to formulate an opinion on the issue. This is a pervasive change that's hard to back out from and that affects the whole team. There should be a common understanding of the problems related to K releases, how a monorepo addresses them, what the tradeoffs are, and what the alternatives are. Maybe we can also do a small trial in a test repo.

Until then, maybe we can implement one of the more lightweight alternatives, and see how the team feels about them.

Maybe we can still specify that dependencies in that direction don't cause tests to re-run?

Yes, we should absolutely do that (see the Testing section in the issue description), at least on a subproject granularity. But with some additional disclipine we can even have smaller units.

ehildenb commented 8 months ago

We can start by merging the HB and HB Booster repos. And maybe K and pyk. Maybe merging those will prove enough? Merging LLVM and Haskell could also be good.

Baltoli commented 8 months ago

We can start by merging the HB and HB Booster repos. And maybe K and pyk. Maybe merging those will prove enough?

I think this will go a long way; as will removing the reverse dependency of LLVM on K via the matching compiler rewrite. Not so sure about merging Haskell + LLVM but I'm open to being convinced.

ehildenb commented 8 months ago

One potential thing @Baltoli and I discussed is this:

Immediately:

Later (optional):

Now we have that K depends on kompile for the KAST side, and on HB -> LLVM on the Kore side, and the only repo that mixes Kast and Kore is the K (and pyk) repo. This also provides us a path towards deleting the Java code from the K repo by factoring out the basic functionalities needed related to parsing and compiler pipeline into it's own separate codebase that is not Kore aware.

Dependency chain looks like:

K (and pyk) <- { kompile , HB <- LLVM }

Baltoli commented 8 months ago

Via @jberthold; there are various tricks one can use to merge Git repositories without losing history - this is important! https://github.com/runtimeverification/hs-backend-booster/issues/532#issuecomment-1972131522