[RFC] Github: using 'Rebase & merge' as a default merge strategy

commodo commented 3 years ago

Hey,

Just raising some thoughts. If there is no agreement here, then also fine. I'll close this in 1-2 weeks if there is no conclusion on this.

So, Github allows 3 merge strategies. Any thoughts on changing default to Rebase & merge? Reason is: git history looks like this [1] Too many merge commits with single commits.

If it is a good idea, then it's in Settings:

Git history of packages [1]

champtar commented 3 years ago

In Gitlab you can rebase, wait for CI again, then merge, which I really prefer. Rebase remove the history so you don't know what was really tested, so I'm a bit against it, and also it removes the information of who took the decision to merge in git. I'm totally against squashing because the submitter should take care of sending proper commit, it's not the job of the guy that merge to cleanup. => ok to have both merge or rebase, I prefer merge by default

commodo commented 3 years ago

Maybe as a quick addition: “Rebase & merge “ is like going back to the old way (or how OpenWrt core does it) where a maintainer applies a patch (which was sent via mailing list)

champtar commented 3 years ago

But the maintainer was doing rebase, test, merge, now we will do test then rebase & merge

aparcar commented 3 years ago

I asked the same question over at LuCI, just for reference.

I'm preferring the rebase strategy and already apply that whenever I merge anything.

GitHub stores both the information of who did it (the committer)...

... and also which PR it was:

I'm in favour to disable anything but rebasing and have "cleaner" commit log.

aparcar commented 3 years ago

@openwrt/package-maintainers ping

hnyman commented 3 years ago

I have lately been using "rebase & merge" in LuCI when there a single commit in the PR, but have used merge for the more complex stuff.

I would prefer to keep both "rebase & merge" and "create merge commit" as options.

Default could be "rebase & merge" if it is possible to select a deafult. Note that Github remembers your personal selection, so it will offer the next time the same style for you in any case.

(And they have now introduced also a third option, "squash and merge" that I dislike.)

rmilecki commented 3 years ago

YES, PLEASE

Rebase remove the history so you don't know what was really tested

It doesn't squash commits, so you still see each of them (assuming there is more than 1 at all). It may result in loosing some context only when dealing with conflicts. That's pretty rare for packages I believe.

aparcar commented 3 years ago

@neheb As you do quite a bi chunk of the merging here, please comment

neheb commented 3 years ago

I'm OK with it. Rebase and merge still keeps context (PR number). I've merged PRs that way before but haven't really thought about git history and whatnot.

aparcar commented 3 years ago

I'd suggest to disable merging too as even a rebase keeps the history. If there are objections I'd only disable squashing.

champtar commented 3 years ago

We had the 3 options enabled, I just disabled squashing for now

neheb commented 3 years ago

My latest merges have all been rebases. That should be fine I guess...

aparcar commented 3 years ago

@thess @feckert please join the fun

champtar commented 3 years ago

Rebase merge, we do have the person in git

commit 2e297d29c026cc2d7a53ded5d30f1a8c5429c9d2
Author:     Phil Eichinger <phil@zankapfel.net>
AuthorDate: Wed Oct 28 15:11:28 2020 +0100
Commit:     Rosen Penev <rosenp@gmail.com>
CommitDate: Wed Oct 28 17:49:48 2020 -0700

    at: bump to version 3.2.1

    Change upstream to official repository at
    https://salsa.debian.org/debian/at

    Signed-off-by: Phil Eichinger <phil@zankapfel.net>

And the PR number only in Github UI

Normal merge

commit 6a7ea2e42a3de584b308cf7cf94a251f12b6a551
Merge: 805e00a78 e8dd5038a
Author:     Ted Hess <thess@kitschensync.net>
AuthorDate: Wed Oct 28 09:42:32 2020 -0400
Commit:     GitHub <noreply@github.com>
CommitDate: Wed Oct 28 09:42:32 2020 -0400

    Merge pull request #13795 from philenotfound/lame_cpe_id

    sound/lame: add PKG_CPE_ID

commit e8dd5038a9226636cab1a0efd3f51c987b04ea64
Author:     Phil Eichinger <phil@zankapfel.net>
AuthorDate: Wed Oct 28 11:56:50 2020 +0100
Commit:     Phil Eichinger <phil@zankapfel.net>
CommitDate: Wed Oct 28 11:56:50 2020 +0100

    sound/lame: add PKG_CPE_ID

    Signed-off-by: Phil Eichinger <phil@zankapfel.net>

BKPepe commented 3 years ago

If we would be using rebase & merge button, then the Verified check (verified GPG signature for commit) will be gone.

aparcar commented 3 years ago

@champtar How about --format=fuller?

user@dawn:~/src/openwrt/packages$ git show 2e297d --format=fuller
commit 2e297d29c026cc2d7a53ded5d30f1a8c5429c9d2
Author:     Phil Eichinger <phil@zankapfel.net>
AuthorDate: Wed Oct 28 15:11:28 2020 +0100
Commit:     Rosen Penev <rosenp@gmail.com>
CommitDate: Wed Oct 28 17:49:48 2020 -0700

    at: bump to version 3.2.1

    Change upstream to official repository at
    https://salsa.debian.org/debian/at

    Signed-off-by: Phil Eichinger <phil@zankapfel.net>

If we would be using rebase & merge button, then the Verified check (verified GPG signature for commit) will be gone.

I don't see anyone in packages.git using this so is that really relevant?

champtar commented 3 years ago

@aparcar I'm using --format=fuller ... Do we really want to forbid merge ?

neheb commented 3 years ago

@BKPepe GPG is a joke. The creator of GPG doesn't even use it. I don't see it as being relevant in git.

edit: I misspoke. PGP.

@champtar @commodo was complaining about the history created by merging, no?

aparcar commented 3 years ago

Do we really want to forbid merge ?

No I want to encourage rebases

@aparcar I'm using --format=fuller ...

Right, sorry I missed that.

And the PR number only in Github UI

Is that really relevant? Maybe it is, I'm just thinking about openwrt.git where that's not stored at all, patches come from GitHub or via mail.

champtar commented 3 years ago

openwrt.git workflow is rebase, hopefully test by the committer, then merge, but yes we are fine without any link to the source of the patch.

No I want to encourage rebases

I think we are all set then ;) https://github.com/openwrt/packages/issues/13654#issuecomment-712821237

BKPepe commented 3 years ago

I don't see anyone in packages.git using this so is that really relevant?

I would like to apologize, but I am not looking at git.openwrt.org, I am talking here about Github for packages repository, which is used for Pull Request. I see that there are several people without commit access and also people with commit access, which they are using GPG or S/MIME for their commits. I don't want to mention anyone, but here you have a few examples:

https://github.com/openwrt/packages/commit/5c53fe2ad3978d6b2b586aed3f5f2b80a9a95f22 https://github.com/openwrt/packages/commit/ba8748f9570cb7ec1c02b85bcf3603ccc7100f62 https://github.com/openwrt/packages/commit/d276c81ea81dac8d954f7606b44bdfb68548355b https://github.com/openwrt/packages/commit/4b0d029bb535229c1d5f71a8d90b8f7994ca2ae0

Also @neheb, me are using that as well.

I would say there's a difference if you see this obrazek or not.

neheb commented 3 years ago

It's interesting.

I have a GPG key uploaded to GitHub. Merges from me show Verified but merge + rebases from me do not.

Anyway, I don't value GPG so it's a moot point.

Cynerd commented 3 years ago

Anyway, I don't value GPG so it's a moot point.

And how do you plan to verify commits? I think that in distribution maintainer should always be reliably verified as it is the last guard before it is pushed to all users. How can you trust security of OpenWrt when source is not verified?

I somewhere even have code to do quite opposite in OpenWrt, to verify feeds by git (that is by pgp). Right now the only insurance we have is https. We trust you but should we also trust that servers hosting repository are impenetrable? Hardly. This is step backward from security, really.

I have a GPG key uploaded to GitHub. Merges from me show Verified but merge + rebases from me do not.

I am sure that you do not have private part on github for sure can't sign your rebase commits. Since content of commit changes in rebase it has to be signed again. Solutions are either merge commits or doing rebase locally (allowing only fast-forward merges on github without rebase allowed). Github allows you to modify branch user submitted for merge so there is no issue in rebasing manually instead of on github automatically. In Turris we have policy that tip has to be always signed by someone from Turris team.

And honestly it doesn't matter if gpg is terrible project, up to my knowledge it is the only way to sign git commits/tags, at least it is the way. So any gpg or pgp critique here is just baseless. Go and improve it or submit replacement to git (it is open-source nonetheless) but right now it is about signing or not signing and I am saying: Are you out of your mind to make not-signing the policy?

neheb commented 3 years ago

Anyway, I don't value GPG so it's a moot point.

And how do you plan to verify commits? I think that in distribution maintainer should always be reliably verified as it is the last guard before it is pushed to all users. How can you trust security of OpenWrt when source is not verified?

AFAIK, nobody can rewrite history here. git --force push will fail.

I somewhere even have code to do quite opposite in OpenWrt, to verify feeds by git (that is by pgp). Right now the only insurance we have is https. We trust you but should we also trust that servers hosting repository are impenetrable? Hardly. This is step backward from security, really.

Maybe makes sense with GitLab. No idea.

I have a GPG key uploaded to GitHub. Merges from me show Verified but merge + rebases from me do not.

I am sure that you do not have private part on github for sure can't sign your rebase commits. Since content of commit changes in rebase it has to be signed again. Solutions are either merge commits or doing rebase locally (allowing only fast-forward merges on github without rebase allowed). Github allows you to modify branch user submitted for merge so there is no issue in rebasing manually instead of on github automatically. In Turris we have policy that tip has to be always signed by someone from Turris team.

And honestly it doesn't matter if gpg is terrible project, up to my knowledge it is the only way to sign git commits/tags, at least it is the way. So any gpg or pgp critique here is just baseless. Go and improve it or submit replacement to git (it is open-source nonetheless) but right now it is about signing or not signing and I am saying: Are you out of your mind to make not-signing the policy?

I don't see any value in signing commits. I only see value if rewriting history is a possibility.

aparcar commented 3 years ago

These Verified badges give a sense of security as it's a GitHub key. It just verifies your merge on the fly, this has nothing to do with uploaded GPG. Uploading any (public) GPG keys just seem to be a perk for the UI?

Currently more than 30 people have commit access on packages.git. If you want more security it should involve everyone. Just having a few people signing a commit every now and then doesn't really help.

Anyone modification of the git history would be detected and reported by multiple maintainers/developers. I don't see that as a big risk, correct me if this assumption is wrong.

It seems more likely that some account with commit access is compromised and pushes malicious code. We could require 2FA for all maintainers.

So, any convenience allowing modification of packages.git via the web interface means potential security issues. If you don't trust GitHub we can move things to git.openwrt.org and follow the same workflow as openwrt.git (which barely does any signing neither).

BKPepe commented 3 years ago

These Verified badges give a sense of security as it's a GitHub key.

What are you saying just applies to merge commit. You need to take a look at the commits, which I posted. For example: obrazek obrazek

Those are not definitely GitHub keys.

BKPepe commented 3 years ago

Uploading any (public) GPG keys just seem to be a perk for the UI?

While using git in CLI, you can see if the commits were signed by GPG or not. By using git show <commit> --show-signature and same applies to git show --show-signature HEAD^ or there is command git verify-commit <commit>

We could require 2FA for all maintainers. Go for it.

We can move things to git.openwrt.org and follow the same workflow as openwrt.git (which barely does any signing neither).

You don't mean this seriously, right? Do you have any statistics for this? I would like to say that many people are using pull requests via GitHub and not many are sending patches via git send-email. Also, people want to have them as merged pull requests instead of showing them as closed.

aparcar commented 3 years ago

Okay I'll leave the decision to the rest of the (maintainer) group. I think it's good to disable squashing of commits in PRs and agree that it's bad style to rebase a signed commit. However as a high number of commits is unsigned, rebases should be preferred for these cases. If there was a signature added, a merge should be done to keep the signature valid.

Additionally, whoever wants to make it as their mission, could encourage people to sign their comments?

dwmw2 commented 3 years ago

FWIW I detest rebasing as it lies about history. If I test my changes and commit them, then that is what should go into the history. If something subsequently breaks, I should be able to bisect from that known working point.

Rebasing breaks that by throwing away the true history which was really the whole point of a version control system. Instead of having that fixed point preserved accurately, we get to take our chances with whether it ever works or not.

And yes, those failure modes aren't ever so common but they do happen. Maybe someone updates a library or function that I'm using in the meantime and my commit no longer works. And by the time we come to work it out, it looks like my commit never worked because we didn't use git in the way that it is actually supposed to be used; we threw away the true history.

Every time this happens to me I am livid with whatever project I'm working on, which uses git incorrectly. Use the system properly; never rebase.

Please, let's not be one of those projects that screws it up and doesn't even understand the basic concept of what version control is actually for.

dwmw2 commented 3 years ago

Reason is: git history looks like this [1] Too many merge commits with single commits.

In case I was too ambiguous about how much I don't agree with this proposal, let me say also that I object to the very premise of this observation. The history that you see there, "messy" as it is, is the truth. It is the true history, and preserving that is what we have a version control system for. If we want to ditch that because the true history is hard to handle, why bother with version control at all? Why not just throw tarballs over the wall?

CodeJuggernaut commented 3 years ago

Rebasing does not lie about the history or cheat the system in any way. It simply stashes your changes, pulls the latest changes from remote, and then applies your stash on top of those, so it appears in a linear fashion. I can't see anything wrong with this and it is definitely the preferred way when pulling a shared branch such as develop into your feature branch. The history is cleaner and therefore much more useful to reason about who did what. The same would be true when merging your feature back into develop.

Merge commits are messy and needlessly complicate the branch history. Please allow using "Rebase and commit" as the default merge strategy. Why not give users the control over their own team's repo?

Cynerd commented 3 years ago

Just to restate reasons why it is not a good idea to allow rebase on the button on Github: https://github.com/openwrt/packages/issues/13654#issuecomment-719433915

@CodeJuggernaut I do not have anything against rebases. It is even our workflow if you look into our projects (such as turris build, it has three development branches and you can clearly see them, they are not drowned in the sea of merge commits). This issue is about enabling rebases on Github on the button and that is not a good idea. The commits should be signed, or at least the tip should be, but Github does not have private keys (and it should not have them) thus it can't sign rebased commits. The solution we use is that we simply do rebase and merge locally and we never ever merge using the button (in our case on Gitlab).

In my eyes there are two options:

Doing rebase locally and allow only fast-forward merges on button
Use merge commits

I prefer the first option. Github, as well as Gitlab, allows maintainers to push to the source branch thus rebase can be done locally by the maintainer and signed by his key. The keys used to sign the repository tip should be some set of maintainers anyway (preferably signed by one top-level OpenWrt trust key).

The second variant has few drawbacks such as that tip is not signed (because the tip is a merge commit created by Github) but that can be overcome by ignoring merge commits in the verification process. The second issue is that commits are not signed by the maintainer's key if they are even signed. It simply opens the whole can of worms but still, it is possibly a viable way.

What is not in any way viable is to say that the security of the source is not important!

Edit: to also address @dwmw2 reasons. They make sense but I think that it is the maintainer's responsibility to check, test and verify before the merge. If there is some non-trivial rebase required it is completely correct that maintainer should ask the original author to do that so the original author should verify the result. I do not see how that is laying about the history. I also have to point out that merge commits on their own do not ensure the functional state after merge as merge commit can be wrongly resolved the same way as the rebase can. The only advantage is that there is still a fully accessible original version in the main repository but I am not sure if that helps with the fix in most cases. I consider the "true history" only what maintainers commit to the main branches. If there is rebase performed during the change integration process I consider that the same way a developer doing "work in progress" commits and then squashing them all to one. It is lost to the history but it is not relevant to the main repository.

jefferyto commented 2 years ago

I sign my commits and would prefer if my signed commits are the ones that are committed to the repo.

I also often open a PR for master and do another PR for the stable branch immediately after, e.g. #17576 and #17577. With rebase and merge, the (cherry picked from commit ...) line in the stable branch commit message references a commit that does not exist in the repo. In the above example, 5f517cc5846e87b9fd6b50c08c02c9ba42180742 says (cherry picked from commit 45208db29bd22238d355f4f8fdcc02fd2045c78a) but 71d0d9a877d27bf367e5e4a202f96094a637f57f was committed to the repo, not 45208db29bd22238d355f4f8fdcc02fd2045c78a.

(Yes, I can open a PR for master, wait for it to be merged, then open a PR for stable, but this seems unnecessarily slow to me.)

(Yes, I can merge my own PRs, but as a general rule here I want someone else to review my work before merging, and if I were to merge my own PRs I would use merge commits anyway.)

dwmw2 commented 2 years ago

Rebasing does not lie about the history or cheat the system in any way. It simply stashes your changes, pulls the latest changes from remote, and then applies your stash on top of those, so it appears in a linear fashion.

You realise those two sentences are mutually exclusive, right?

When we develop new code, all our thinking is done about how it relates to the original code base that we're working on. We look at the subtle details of how APIs work in that version, we do all our hardcore testing (including hacking the code to exercise error paths and other things) based on that version.

When someone changes things in the upstream before we've "finished" and pushed our own code, that's when we need either a merge or a rebase. It is a fact of life that the merge/rebase is done with a lot less care. As long as the patches apply mechanically to the source code, nobody even does any real thinking about it at all. (I know @Cynerd claims that it's the "maintainer's responsibility to check" but isn't really how it has ever worked, and we're now talking about having the tools do it automatically.)

A merge shows the true history. It shows that there were two independent pieces of development from the older code base, which were subsequently merged. That is what happened. A rebase shows a distorted and false version of history, pretending that one of the patches was actually developed on top of the other. And most of the time, that doesn't matter because they're unrelated. But sometimes — and I've seen this happen — you end up with a rebased patch that, according to the git history, never worked. Because something subtle changed in the remote between the actual development, and the point it got rebased to. And because you've abused the version control history to store falsehood in truth, you've thrown away the only point in history where it actually worked, that you can compare and bisect to.

A merge commit is the truth, and in some cases it's a very important part of the truth. We shouldn't throw them away because some people don't find it "pretty enough". We certainly shouldn't lead developers into such bad practices, which in itself is a vicious circle because if they're not exposed to normal truthful DAG then they will continue to find it unfamiliar and ugly.

The unfamiliarity, and thus the perceived ugliness, of a true DAG will pass. The correctness, and the availability of true history, will not.

Cynerd commented 2 years ago

@dwmw2 my point about testing to be maintainer's responsibility is rather that the maintainer should ensure that code compiles and tests passes on top of code quality checks. Based on those criteria he should accept PR or raise requests for change. That is and always was the maintainer's job in any project. Honestly, the code compiling and tests passing are what CI should ensure and the maintainer should only check the result before merging. @dwmw2 I think that talking about true history is not a strong argument considering the common PR review process that requests changes, adds fixups and suggestions and so on. All that has to end with rebase especially if there is some major change in master. All that lies about history but I do not think that we want that in our git history. I think that we really have to look at it from point of what testing and that only commits that are testes should be merged.

I am not against merge commits. We are using fast forward merges in Turris because commonly there is only one maintainer of one project so we can. Doing always fast forward merge on a project where there are multiple maintainers is just a race of testing before merging or merging untested code.

Honestly, neither of the approaches is fully without issues (I am now talking about rebasing with fast forward merge and optional rebase with merge commit). From a testing point of view, the merge commit is a change that is only tested in master and not before (considering standard push and wait for CI approach). Fast forward merge solves that as what lands as a tip in the master is already tested but that is also pretty much impossible in a multi-maintainer environment.

My primary points are:

The Github aided rebase should not be allowed as it removes signatures
The tip of the merged branch has to be always tested before merging (once again I am talking here about CI pretty much)
Merge should be either fast forward or merge commit which contains pretty much no change on its own and branch should be always as close to master as possible

commodo commented 2 years ago

I forgot about this issue. And I said I would close it in 1-2 weeks, but now it's 1+ year.

The idea was to raise some thoughts about this. So, thoughts were raised. But, no conclusions were reached :)

So: votes or conclusions?

dwmw2 commented 2 years ago

From a testing point of view, the merge commit is a change that is only tested in master and not before (considering standard push and wait for CI approach). Fast forward merge solves that as what lands as a tip in the master is already tested but that is also pretty much impossible in a multi-maintainer environment.

That isn't really how the world works. When we're writing new code, we'll literally hack it with if (1 || ...) to exercise different code paths and make sure all the error handling and stuff works right. On a rebase or merge, we just boot it and do whatever simple testing is possible. Almost 100% of the time, that difference is true. And with a rebase workflow, we really do end up with a commit that "never worked" in some (esoteric, rarely tested) cases. When the true history would show a commit that worked, then a merge which broke it.

Cynerd commented 2 years ago

@dwmw2 you are trying to preserve history to see who broke things. I am trying to ensure that master is always as stable as possible. The truth is that testing you are describing is in general just very fragile. When you can't ensure that initial merge to master won't result in your code breaking in some cases, how do you plan to ensure that it won't break in the future? I am not saying that anything can be auto-tested. I am just arguing that if you can't be bothered to test change in the form it is merged to master how bothered are about any issue in the master? It is nice that you have original working code but if there was a big difference between the master and that branch at merge time it won't help you that much. In the end, we are arguing almost the same thing just from a different point of view. I am also for merge commits (or fast forward merge if they are possible). You also want the merged commits to be tested. I am only arguing against never doing rebase. I am saying that rebase is completely valid and even required thing that should be requested by maintainers if the branch to be merged is too far off from the current master. The point is that once the author does rebase he should do its testing again otherwise he submits the invalid code. It is part of the author's job. At the same time, it is the maintainer's job to ensure that code he accepts and merges works (but I do not think that maintainer should test it as hard as the author of the change). I just from point of master do not see a difference between author submitting wrong code and author submitting a good code but breaking it with rebase down the line. I see a difference if the maintainer breaks the functional code because there were merge collisions or just hidden unrelated incompatibilities due to merge branch being far off.

champtar commented 2 years ago

Just a reminder, we are talking about the packages repo, each directories are mostly independent and what's important is the commit of OpenWrt "base" you used and that sadly is not present in the history. I agree merge is the way in general but I don't care for this particular repo.

dwmw2 commented 2 years ago

Except when one person changes a library and another changes a package which uses it?

champtar commented 2 years ago

Except when one person changes a library and another changes a package which uses it?

Yup, thus my use of "mostly" ;)

With Gitlab you can rebase from the UI and wait for CI to run again but sadly GitHub doesn't offer that option.

openwrt / packages

[RFC] Github: using 'Rebase & merge' as a default merge strategy #13654