Open tothtamas28 opened 8 months ago
I disagree pretty strongly with making things a monorepo for several reasons:
Either way, if we do it, we should not invert the pyk <- K
dependency. Pyk is the new frontend for all intents and purposes, and K is moving towards becoming just parser + kompile, which is one tool for managing Kast and pretty formats.
We already have the benefit of "monorepo", by having a single repository that all updates go through, the pyk repo. We can have better discipline (or even enforcement) about making it so that when we have a change upstream (say to booster), that we have branches (referenced in the PR, and tested there in downstream repo) in all downstream repos that need a change before it gets merged. That doesn't make a monorepo, instead makes a separate k-testing
repo that runs a battery of integration tests against all the specified branches that the PR pulls in (which in most cases, will be a bunch of master
branches).
Basically, I have yet to see an argument for monorepo that actually solves the "my change breaks things elsewhere" problem, while not increasing CI burden way too much.
It will increase CI times pretty dramatically. A change to the LLVM backend, which does not touch anything on Haskell, will run all of Haskell CI.
Workflows should be set up so that they are triggered only on changes to certain files: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore
So since haskell-backend
is not a dependency of llvm-backend
, those tests should not run. On the other hand, tests in booster-backend
should run, since llvm-backend
depends on it. One advantage of the mono-repo setup is that this testing is automatic.
And if everything is in the same repo, subprojects can start to migrate their tests to the pyk
test harness. For example, instead of storing definition.kore
+ request.json
files that are impossible to comprehend and update, the symbolic backends could use the Kompiler
and KoreClientTest
classes to make requests to the server.
I still don't see the advantage. If we want to migrate towards a pyk-based testing harness for everything, then we can simplify have a separate pyk-based testing repository with integration tests, and have each PR on the separate repos call into that one to get test results back. Then on the pull request (or committed to the repository), you can specify the branches of different repositories that must be used for your PR into this particular repository.
If we're limiting test runs based on changesets anyway, we are not solving the fundamental problem of unexpected breakages. The issue arises when a change to a upstream repo causes a breakage downstream to a seemingly unrelated component. Unless we run all the tests on every change, we cannot avoid this.
it's to be expected that massive changes to dependency chains in projects will cause us headaches, I don't think a monorepo will solve that
Where it helps is that changes become atomic, so for example the following scenario will not happen:
llvm-backend
, and propagates to evm-semantics
through a number of repos, where it breaks the test suite.haskell-backend
, and propagates to pyk
through a number of repos, where it breaks the test suite.llvm-backend
to fix the regression in evm-semantics
. It is stuck in pyk
untilhaskell-backend
to fix the regression in pyk
.Instead:
evm-semantics
directly, where it breaks the test suite.pyk
test suite. The issue is fixed, then the PR is merged.evm-semantics
directly where it fixes the breaking test.The time it takes to propagate haskell-backend -> hs-backend-booster -> k -> pyk
is significant, even if nothing comes up. From detecting the issue from PR1 to propagating PR4 downstreams it can take weeks and a lot of coordination between teams. In a monorepo, this coordination is driven by the tests that break, and puts the burden of handling the issue on the person that breaks the test, not the person that tries to integrate the new version two repos down the chain.
If we want to migrate towards a pyk-based testing harness for everything, then we can simplify have a separate pyk-based testing repository with integration tests, and have each PR on the separate repos call into that one to get test results back.
If this exists as an alternative, it should be planned and compared to the monorepo proposal before we make a decision.
If we're limiting test runs based on changesets anyway, we are not solving the fundamental problem of unexpected breakages.
The triggering changeset should be coarse enough, e.g. we know that llvm-backend
depends on k-frontend
, so any change in the k-frontend
folder should trigger the test.
Obviously, there's a tradeoff between efficiency and thoroughness of testing, but that tradeoff exists regardless of the repo setup. Right now, we just let developers make the call, and it doesn't work well in practice.
The downstreaming time HB -> Kontrol doesn't take weeks anymore in the happy path, it takes days at most, and usually is same-day. In the unhappy path, I don't see it being any better, we still need to sort out depenedency issues. The unhappy path is also not solved by monorepo because KEVM will not eb included there, and neither will Kontrol.
After some thought, I think my primary motivations for wanting something like this are:
arenaAlloc
change from last week), but as Everett points out that doesn't help once the change leaves K to hit EVM (say).kup
?) and run a set of downstream semantics with it.I'm all for a dedicated separate repo that is able to run KEVM/Kontrol tests, for example, and KMIR/KWasm, and that repo must be passing some smoke tests for upstream codebases to be able to merge. That would actually solve this problem. But a monorepo won't we'll still see the same breakage at KEVM level. Also, from KEVM perspective, things are atomic, since it only has dependency on pyk anyway, not directly on any other component.
I'm all for a dedicated separate repo that is able to run KEVM/Kontrol tests, for example, and KMIR/KWasm, and that repo must be passing some smoke tests for upstream codebases to be able to merge.
Yep, this is the conclusion I think I converge on as well
We can implement that separate smoke test repo with current setup without moving towards monorepo (and thus having to analyze changesets or increase CI pressure), and later move to monorepo if we still see an argument for it. They are orthogonal solutions that solve different problems. Still not convinced monorepo really solves any problems, but maybe if smoke test-suite doesn't work then it we'll see something else as argument for monorepo.
Already, the HB team has been making sure to run KEVM/Kontrol CI on their codebases and post performance numbers to the PR to ensure there isn't regression. That has dramatically improved the downstreaming story when other breakages aren't involved. So enforcing that and rolling it out to other repos I think is the next logical step.
Perhaps let's discuss tomorrow how we might start to build that smoke testing repo from a practical perspective. In an ideal world, there would be a button somewhere on GitHub that - when pressed and given an argument - assembles a K toolchain, runs downstream tests, then provides some sort of easily inspectable certificate in the logs that can be pasted into a PR.
I'll reference this issue: https://github.com/runtimeverification/k/issues/2869
I think the PR volume on a monorepo will be insanely high, and not only will increase CI build pressure, but will increase frustration as people's PRs will be delayed from being merged because some other PR gets merged first. Already this can happen in each individual repo. Every time one person's PR is merged, your branch needs to be updated, any CI time already spent on it is wasted, and then you restart CI only to potentially have it happen again. Then everyone is watching their PRs to hit the Update Branch button first, and it repeats.
Having separate codebases gives us reduced CI burden and faster lower-friction merge times, which is not to be discarded. I'm all for increasing our testing capabalities, but I don't tihnk monorepo is the solution.
The reality is that 80% (or more) of PRs do not have the downstream issues, as evidenced by the weeks of time that went by where downstream merges were going smoothly and quickly. We should not optimize for the less common case (when we have a huge breaking change that has to go through), though we can try and catch it better.
Another option is that when we have a huge breakage going through, we can block merging on other repos in the system, to get that change pushed through in isolation. That could be a pretty straightforward way to stop change pile-ups like we just had.
I'm all for a dedicated separate repo that is able to run KEVM/Kontrol tests, for example, and KMIR/KWasm, and that repo must be passing some smoke tests for upstream codebases to be able to merge.
K Monorepo vs semantics test repo are not mutually exclusive proposals since they address different system boundaries (between K components vs between K and semantics).
I think the PR volume on a monorepo will be insanely high, and not only will increase CI build pressure, but will increase frustration as people's PRs will be delayed from being merged because some other PR gets merged first.
That's a good point. I added a bullet point for this in the issue description.
We don't have issues with K-internal updates, they don't end up blocking merges to downstream projects where the updates are needed, and are usually resolved quickly in comparison. The main issue from user side we need to fix is not breaking KEVM/Kontrol with updates to K tools.
For that, I suggest a separate integration testing workflow, that allows each repo to specify the verions of downstream software it's expected to work with. That information can also be used to automatically pull in the downstream branches with the needed fixes as the update-deps job gets pushed through, while maintaining lightweight individual CI testing and merging.
We have looked into merge queues, and tried it out. It does not meet our needs, as it does not allow different batches of tests to run to enter the merge queue vs once you're in the merge queue, so it doesn't actually reduce CI burden. This has been known a while: https://github.com/orgs/community/discussions/46757?sort=new. Maybe things have changed since we last looked, but AFAICT they don't meet our needs.
Another option, is to auto-rollback updates which cause breakages. Basically automatically revert all software to the latest known working version (and automate this workflow) when there is a breaking change, with an automated revert PR to the relevant repos. This at least can be pretty automated, and then allows us to optimistically merge quickly, but also quickly get back to working state.
But once an automatic revert is applied, then we are once again having to just test the change with downstream software before merging again. So having automated way to test downstream software with current version upstream is needed anyway.
We don't have issues with K-internal updates, they don't end up blocking merges to downstream projects where the updates are needed, and are usually resolved quickly in comparison.
Here's a recent K update from pyk
: runtimeverification/pyk#913. It took five days to merge. During that time it was not possible to ship any critical fixes to downstream projects. It also bumped K version 12 times, meaning that all those versions had to be integrated into downstream semantics in one step.
Monorepo without running all of pyks tests in addition doesn't fix that though, testing downstream will.
I would definitely be in favor of a change where all the dependent repos fed directly into pyk (instead of into K repo), then we can easily revert just one repos changes. That also would address that particular issue.
{LLVM, HB, HBBooster, K} -> pyk
Then if an update to one of them breaks pyk, the others still can update. But each repo can still maintain its own merge cadence.
Here is the list of updates from K -> pyk https://github.com/runtimeverification/pyk/pulls?q=is%3Apr+is%3Aclosed+Update+dependency%3A+deps%2Fk_release
Of those listed on the first page, all of them merged without incident except the one linked above. The same is true going to the next page, and back for quite a while.
Fixing this one issue, where we had a major changes to upstream projects dependencies (GHC upgrade on Haskell backend, glibc upgrade on LLVM backend), is not what we should prioritize for smoothness (this type of upgrade is not common). We should prioritize the common case being smooth and low CI burden, and that's exactly what the current setup does.
I'm all for adding extra checks to catch these issues earlier, but am very against increasing friction in the common case.
Here is the list of updates from K -> pyk https://github.com/runtimeverification/pyk/pulls?q=is%3Apr+is%3Aclosed+Update+dependency%3A+deps%2Fk_release
Of those listed on the first page, all of them merged without incident except the one linked above.
Here's another one from the first page (merged two weeks ago) that didn't: runtimeverification/pyk#860. Took 11 days to merge, bumped 23 versions.
The number of non-problematic PRs is not indicative of the problem, the number of days spent without an incident is. And just by looking at these two PRs, we can say that we've spent half of February without being able to make a stable K release.
We agree that this is an issue, and that the solution is to run tests from dependents (within the K ecosystem) before merging a PR in a dependency. In a monorepo, this is automatic. In the current situation, it is either manual, or requires additional moving parts in our devops setup, which have to be maintained, can be down, etc. Also, in a monorepo, you test the actual release, and not something that approximates it (since by the time you finish testing, some other subproject might make a release).
I would definitely be in favor of a change where all the dependent repos fed directly into pyk (instead of into K repo), then we can easily revert just one repos changes.
We should consider this alternative. But the dependency between other K components will be still there, and reverting the changes, coordinating the versions, etc. can still have a significant overhead.
I think the PR volume on a monorepo will be insanely high, and not only will increase CI build pressure, but will increase frustration as people's PRs will be delayed from being merged because some other PR gets merged first.
Whatever the setup, ensuring it is not a pain to work with will be absolutely necessary, so we should analyze this. Here are my thoughts.
We can decide not to run tests in a monorepo. Conversely, we can establish processes in the current setup that require developers to run tests from other repos.
Moreover, we run a lot of tests in the multi-repo setup too. By the time a change from haskell-backend
is merged in pyk
, we run tests from haskell-backend
, hs-backend-booster
, k
, pyk
. We just run these tests sequentially.
I'll assume a merge queue for the monorepo (this got discarded quickly, but I do not have a better alternative at this point). There are two kinds of changes.
In multi-repo setup, you can merge a pull request in the corresponding repo without a problem, as long as the change is up-to-date with master
. W.r.t. commit frequency, this is the ideal case for the multi-repo setup.
In a monorepo, this corresponds to a change that's confined to a single subproject. Suppose there are no merge conflicts (i.e. the change is up-to-date with its own subproject folder), and branch protection checks are passing. You put the PR in the queue, the changes are merged with the last item of the queue, and tests are run again. There are two cases:
In a multi-repo setup, this requires multiple PRs, each ideally compatible with current versions of its dependents. Even simple changes require careful coordination (like renaming an internal attribute, see e.g. #3988). When looking at the change as a whole, delivering this is not simpler than making a single PR in the monorepo, and letting the merge queue test the change against the (prospective) most recent version.
Ok, if there aren't other voices, let's try it. automerge
does what I think you're hoping merge-queue will do, which is to reduce CI build pressure to minimum while running all the needed tests, so we can just keep using that system.
I think we should just run all the tests on every PR, running only a subset of tests will not catch bugs like the one above, where a change in Haskell backend resulted in a test in pyk breaking, and the current setup will catch that.
We will have to have discipline among the team to avoid pressing teh "Update Branch" button without really good reason to avoid unnecessary build pressure. Build pressure due to testing PRs that aren't ready or aren't close to merge is unnecessary, and we already spend quite a bit of money on compute.
The increased test pressure here will come from a change to pyk codebase also running HB tests, for example. Maybe we can still specify that dependencies in that direction don't cause tests to re-run?
Ok, if there aren't other voices, let's try it.
Let's wait for other's to formulate an opinion on the issue. This is a pervasive change that's hard to back out from and that affects the whole team. There should be a common understanding of the problems related to K releases, how a monorepo addresses them, what the tradeoffs are, and what the alternatives are. Maybe we can also do a small trial in a test repo.
Until then, maybe we can implement one of the more lightweight alternatives, and see how the team feels about them.
Maybe we can still specify that dependencies in that direction don't cause tests to re-run?
Yes, we should absolutely do that (see the Testing section in the issue description), at least on a subproject granularity. But with some additional disclipine we can even have smaller units.
We can start by merging the HB and HB Booster repos. And maybe K and pyk. Maybe merging those will prove enough? Merging LLVM and Haskell could also be good.
We can start by merging the HB and HB Booster repos. And maybe K and pyk. Maybe merging those will prove enough?
I think this will go a long way; as will removing the reverse dependency of LLVM on K via the matching compiler rewrite. Not so sure about merging Haskell + LLVM but I'm open to being convinced.
One potential thing @Baltoli and I discussed is this:
Immediately:
kore-pattern-matching
and import that into LLVM backend and frontend.Later (optional):
pyk prove
and pyk run
to be feature-complete with krun
and kprove
, so we can remove those functionalities. We've begun this process here: https://github.com/runtimeverification/pyk/pull/916.krun
and kprove
frontends in the Java codebase.kompile
, which implements kompile
and kprove --dry-run
, which are the core functionalities of K that pyk uses. Delete this code from the K frontend. At this point, just kast
would exist in Java in the K repository.Now we have that K depends on kompile for the KAST side, and on HB -> LLVM on the Kore side, and the only repo that mixes Kast and Kore is the K (and pyk) repo. This also provides us a path towards deleting the Java code from the K repo by factoring out the basic functionalities needed related to parsing and compiler pipeline into it's own separate codebase that is not Kore aware.
Dependency chain looks like:
K (and pyk) <- { kompile , HB <- LLVM }
Via @jberthold; there are various tricks one can use to merge Git repositories without losing history - this is important! https://github.com/runtimeverification/hs-backend-booster/issues/532#issuecomment-1972131522
There is an opinion within the K Team that the well-known issues (#3871) regarding the release and integration process of K components would be much more manageable in a monorepo setup where
This issue is for discussing and planning the actual steps we need to take in that direction.
(Sketch follows, to be extended and organized.)
Repo layout
Project dependencies
Integration steps
Integration should happen one repo at a time.
k-frontend
haskell-backend
intohs-backend-booster
hs-backend-booster
intok
llvm-backend
intok
pyk
intok
Testing
When making a change in one subproject, the test suite of all dependent subprojects must be run. This makes releases slower but less likely to cause issues downstream.
Since
pyk
is a dependent of all other subprojects, it can serve as a universal test harness.Issues
paths
/paths-ignore
for this. Also, we should utilize public runners as much as possible.