lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.56k stars 762 forks source link

Use Github Squash and Merge strategy for PRs #24269

Closed jettr closed 2 months ago

jettr commented 2 months ago

Description

Would it be possible use the Squash and Merge strategy for merging PRs into main? If so, then I think we should also want to disable/discourage force-pushing during PR review as well.

This strategy has the benefit of keeping the history on main very linear and easy to follow. Each PRs also become a single commit on main that points to the review history if ever needed later.

While under review, each commit in the PR is a further revision or refinement, typically made in response to reviewer feedback, to the entire PR.

For example when the PR is first uploaded for review, it would typically have just a single commit. If there were no review comments and the commit was good as is, then that single commit for the PR would be rebased onto the main branch as it landed.

If there were review comments, then the author of the PR would address those comments by uploaded subsequent commits to the PR that fix or change various parts of the implementation. The commit description for those commits do not really need to be that descriptive because they will not be part of what lands on the main branch. This allow reviewers to easily see what has changed between when they reviewed the PR originally and the current state of the PR.

After the review is finished, then all of the commits within a PR are squashed into a single commit, and that commit is added to main. The PR still exists on github (forever?), and anyone can go back and see the discussion and revisions on a PR.

If you are familiar with gerrit, then a good mental mapping would be that a github PR maps to a Gerrit Change (or CL), and each commit within a github PR while under review maps to a Gerrit patchset within a Change.

What are peoples thoughts on this review and merge strategy?

a-will commented 2 months ago

I much prefer rebase flows. Squashing makes for more difficult bisection later, as it removes the incremental and atomic work that comes from having individual commits in the first place. In addition, the history gets really ugly and difficult to follow if you throw in multi-branch histories in main that are tracked across merges.

Honestly, I think the closest map to gerrit is actually rebase and fast-forward merge. Each commit is a patch, and GitHub also keeps track of every PR that originated each commit. For gerrit, you still have to force push to clear conflicts too, and I think you shouldn't be adding random fixup commits to the end for gerrit either ...unless it can properly rebase --autosquash on submission. It sure would be nice if GitHub could do that, but it can't yet... https://github.com/orgs/community/discussions/40804

In short, I firmly believe making the PR's history correct (via squashing and rebasing) is the responsibility of the PR creator, and the PR submission should be a set of atomic commits that track specific, incremental changes. GitHub clearly marks changes for each force-push, and if you don't want your PR to include extraneous stuff on a fixup force-push, don't change the base commit when you rebase.

a-will commented 2 months ago

I'll add that in a much bigger project, merge commits start to make sense (there is a place for merge commit flows!). It stops being reasonable for the PR creator to keep up with the ever-changing upstream branch, and the responsibility for performing merges shifts to a dedicated branch or subsystem maintainer, who generally spends a substantial portion of time reviewing and manually fixing conflicts.

I'd argue that the opentitan project is not anywhere close to that threshold, though ;)

rswarbrick commented 2 months ago

Squash&merge would be a bit of a disaster for any large PRs: such a PR might have 10 logically independent changes. Keeping them in separate commits would definitely be better. But I agree with the point that we don't test all the "intermediate states", leading to a potentially flaky history post-merge.

Is there a way to get GitHub to generate merge commits for all PRs with more than one commit? Then we'd have the benefit of "known tested" states without losing the intermediate commits.

The disadvantage would be that we end up with a more complicated commit history (dag, rather than list). Goodness knows whether tools and contributors would be ok with that :-)

a-will commented 2 months ago

Is there a way to get GitHub to generate merge commits for all PRs with more than one commit? Then we'd have the benefit of "known tested" states without losing the intermediate commits.

Yes, that is the remaining pull request merge strategy. The three available are the following:

However, GitHub automatically creating a merge commit sounds disastrous. That's a recipe for accidentally rolling back others' changes when they aren't related to the PR. The risk of problematic merge commits is one of the most significant reasons for requiring a fast-forward merge in the GitHub UI.

Also, an aside: Above, I misinterpreted "squash and merge" and thought that would produce a merge commit, but it's also a fast-forward merge. Nevertheless, I would much prefer not to force PRs to squash any history that isn't explicitly a fixup commit. Whether "squash and merge" should be available in the drop-down menu is an interesting question, but I feel like it's safer to require explicit fixup commits and maybe create something like a GitHub action to autosquash those.

jettr commented 2 months ago

As Will already stated above, the Sqaush and Merge strategy does actually produce a linear git history. I completely agree that a spaghetti git history is absolute terrible and best to be avoided unless really needed (which I agree isn't us).

Keeping them in separate commits would definitely be better. But I agree with the point that we don't test all the "intermediate states", leading to a potentially flaky history post-merge.

The git bisect would actually be more stable with squash and merge as you have pointed out. With the sqaush and merge strategy, the PR history stays around forever and it linked to from the commit that merges on main, so one could dig into that specify PR with all of its review history at that point.

Whether "squash and merge" should be available in the drop-down menu is an interesting question, but I feel like it's safer to require explicit fixup commits and maybe create something like a GitHub action to autosquash those.

Honestly I would rather not give too many options since I think that would cause more confusion. If OpenTitan decides to go with squash and merge, then I think we should really try to make the effort to switch everyone over and not have developers using different merge strategies; I think that would give us less cognitive overhead long term.

I don't know much about Github actions, but the proposal for auto fixup commits seems like a good alternative if we couldn't use squash and merge for some reason. The main advantages I see merge and squash strategy gives us are:

And I feel like the your suggestion of autosquash would basically give us the same benefits (except the last more subjective one) . Do you know how to do or enable the autosquash Github action?

jwnrt commented 2 months ago

I agree with your complaints but would prefer to keep the rebase workflow. OpenTitan very often has PRs which have to change lots of different things at once since hardware, DV, and software are all interconnected. PRs are good for grouping these together, but I still find the separate commits very valuable in the history when I only care about (e.g.) the software part.

I could go back into the GitHub history to find those separate commits, but then I'd also be searching through the pre-squashed fixups which isn't great. It also means the project's history is tied to GitHub, though to some extent that's already the case due to all the valuable information in PRs and reviews.

Aside: my preferred flow would be rebase-then-merge (sometimes called semilinear) where you have a linear history but with merge commits to distinguish the branches / PRs. GitLab and Codeberg/Gitea/Forgejo support this, but alas GitHub does not. As @a-will points out, autosquashes would also mean we could push fixup! commits without having to force-push and not risk them ending up in the final tree, but again GitHub doesn't support that.

GregAC commented 2 months ago

It seems the main reasoning here to move to squash and merge is to improve the reviewing flow?

Ultimately this comes down to the github review flow and as we've been aware of for a long time it's no particularly great.

I think the solution here should instead consider new review tools (such as gerrit). I agree with others that it's useful to have separate commits in a PR, helps break things up for review and helps when you're looking through history trying to understand what led to certain code changes.

The autosquash functionality does sound interesting but I worry it's too brittle, don't word the fixup commits correctly and they'll get merged into master rather than squashed.

If we could introduce something like the autosquash functionality with a mechanism to ensure we don't end up with fixup commits avoiding a squash and getting merged into master that would be worth looking at.

a-will commented 2 months ago

The autosquash functionality does sound interesting but I worry it's too brittle, don't word the fixup commits correctly and they'll get merged into master rather than squashed.

For what it's worth, the wording should be handled automatically with git commit --fixup <commit>

GregAC commented 2 months ago

The autosquash functionality does sound interesting but I worry it's too brittle, don't word the fixup commits correctly and they'll get merged into master rather than squashed.

For what it's worth, the wording should be handled automatically with git commit --fixup <commit>

Yeah I understand that but that relies on people always using it.

Right now because fixup commits aren't a part of the review flow they stand out like a sore thumb and reviewers will point them out so there's little chance they accidentally get merged.

Once you introduce fixup commits as part of the review flow reviews will be used to approve (and committers merging) PRs with fixup commits and they might miss one has been incorrectly constructed.

jettr commented 2 months ago

Is there a way forward to improve the reviewer flow for PRs? I am okay with either using the squash and merge strategy or the auto squash idea from @a-will. Maybe there is another option that would gain even more approval?

a-will commented 2 months ago

Is there a way forward to improve the reviewer flow for PRs? I am okay with either using the squash and merge strategy or the auto squash idea from @a-will. Maybe there is another option that would gain even more approval?

I'm actually not seeing how any of these mechanisms change the reviewer flow--These all merely change what is asked of the contributor at what time, not the reviewer.

If a contributor wants to have their PR appear as a single commit that gets squashed, they can do that locally with git commit --amend. If the contributor has multiple commits but just wants to do fixups that are autosquashed, they can do git rebase --autosquash locally. In both cases, there is a force push, but GitHub tracks the diffs in the PR.

If you prefer to avoid rewriting the branch history until the end, you can wait until approval + passing tests to do the final squash + force-push, and GitHub will show that there was no diff (as long as you don't change the base commit). You could then merge right away, since you knew the tests passed on the same content.

Meanwhile, if fixup commits accidentally get merged occasionally, it's really not the end of the world, is it? No amount of automation is going to prevent all human error, and a not-totally-perfect-looking git history doesn't seem too troublesome to me, at least. We'd know what the intention was.

An "autosquash fixups" PR merge option sounds nice to have, though it seems unlikely to be prioritized. I don't think we should be preventing contributors from making multi-commit PRs and preserving that history on its way to the master branch, however.

jettr commented 2 months ago

I'm actually not seeing how any of these mechanisms change the reviewer flow--These all merely change what is asked of the contributor at what time, not the reviewer.

These new flows affect both the author and the reviewer. For the reviewer, they have a much smoother review process of being able to tell what changed between different review sessions. Github does give you more than it used to by being able to compare forced push branches, but it doesn't take advantage of the way github would normally have comments inline with the old code etc.

If a contributor wants to have their PR appear as a single commit that gets squashed, they can do that locally with git commit --amend. If the contributor has multiple commits but just wants to do fixups that are autosquashed, they can do git rebase --autosquash locally. In both cases, there is a force push, but GitHub tracks the diffs in the PR.

The intent of using the Squash and Merge strategy is to avoid the force push in general (expect for when master changes too much and a force push is required no matter what)

An "autosquash fixups" PR merge option sounds nice to have, though it seems unlikely to be prioritized. I don't think we should be preventing contributors from making multi-commit PRs and preserving that history on its way to the master branch, however.

If preserving multi-commit PRs are what is needed, then Squash and Merge won't give us that

a-will commented 2 months ago

I'm actually not seeing how any of these mechanisms change the reviewer flow--These all merely change what is asked of the contributor at what time, not the reviewer.

These new flows affect both the author and the reviewer. For the reviewer, they have a much smoother review process of being able to tell what changed between different review sessions. Github does give you more than it used to by being able to compare forced push branches, but it doesn't take advantage of the way github would normally have comments inline with the old code etc.

Ah, I see what you're saying--I suppose I've simply never been bothered by it, but I can see why others might be. For what it's worth, similar complaints are lodged here: https://github.com/orgs/community/discussions/3478

As @GregAC said, this is one area where other review platforms do it better.

Meanwhile, as an FYI, it seems there are are some autosquash solutions out there that use merge queues (e.g. Mergify), but I'm not sure if there would be repercussions.