isaacs / github

Just a place to track issues and feature requests that I have for github
2.21k stars 129 forks source link

Pull request "Squash and merge" button changes author details #1368

Open jmarshall opened 6 years ago

jmarshall commented 6 years ago

samtools/hts-specs#339 was a pull request consisting of a single commit:

commit 28c558e0bc7e112b8b7674baf9da5c5cdbd860f8 (range-desc)
Author:     John Marshall <John.W.Marshall@glasgow.ac.uk>
AuthorDate: Fri Sep 14 22:26:33 2018 +0100
Commit:     John Marshall <John.W.Marshall@glasgow.ac.uk>
CommitDate: Fri Sep 14 22:26:33 2018 +0100

It was merged using the GitHub UI "Squash and merge" button (so that it would be rebased onto the current tip of origin/master) leading to the following commit on the main repository's master:

commit 98de888fb25b856fac74ae27023f1f5300752240 (origin/master, origin/HEAD)
Author:     John Marshall <jmarshall@users.sourceforge.net>
AuthorDate: Wed Sep 19 09:41:21 2018 +0100
Commit:     Andrew Yates <andrewyatz@users.noreply.github.com>
CommitDate: Wed Sep 19 09:41:21 2018 +0100

As expected, the Committer and CommitDate have been altered to reflect the person who pressed the merge button.

The AuthorDate has been altered by GitHub to be the same as CommitDate, but this is incorrect — the commit was still originally authored on Friday 14th.

The AuthorEmail has been altered by GitHub to be the default email address associated with the @jmarshall GitHub account at the time. This is very incorrect, as I do not use this email address when doing work on samtools/hts-specs.

When the "Squash and merge" button is being used to squash several commits from different authors, there is an argument to be had that Author/AuthorDate could be updated in some way (or GitHub could just follow what git does in this case).

When the "Squash and merge" button is being used to squash several commits from the same author (so AuthorName and AuthorEmail are the same, which is trivially true when there is only one commit), it is a bug to change the author's listed AuthorName or AuthorEmail.

jmarshall commented 6 years ago

Reply from support@github.com:

Thanks for writing in about this! I'll pass your feedback along to the team. In the meantime, if you have the person who authored the PR itself squash and merge the PR, we retain more information about the author, so that might be a workaround for you in certain circumstances.

The workaround suggested is unworkable because the PR's author is not the maintainer of the code in question, so does not have the authority to review or merge the PR.

The real workaround is for the maintainer to do the merge using the git command line instead of GitHub's UI, but then we will likely run into #2.

jmarshall commented 6 years ago

samtools/hts-specs#335 is a PR that was merged via the GitHub UI (likely also with the "Squash and merge" button), resulting in the commit samtools/hts-specs@eb9d6127c89b7fe8d08c13c4e877ba3ad33af4fe on master:

commit eb9d6127c89b7fe8d08c13c4e877ba3ad33af4fe
Author:     John Marshall <John.W.Marshall@glasgow.ac.uk>
AuthorDate: Tue Sep 4 11:42:42 2018 +0100
Commit:     Mike Lin <mlin@mlin.net>
CommitDate: Sun Sep 16 14:23:18 2018 -0700

Here the original AuthorDate and AuthorEmail have been retained, as expected. So the arguably incorrect date behaviour and the clearly incorrect email behaviour are a recent change, deployed on GitHub.com somewhere between September 16th and 19th.

aspiers commented 6 years ago

This seems to be a rough duplicate of #1303, although it does helpfully provide a lot more details.

jmarshall commented 6 years ago

@aspiers: #1303 is a complaint (filed in July) that a particular project maintainer didn't appropriately preserve Author information when rebasing the OP's commits, and a request for a GitHub enhancement to preserve the original Authors informations outwith git when squashing commits from several authors in the UI.

This is a bug report about a behaviour change in September. The two pertain to related areas of functionality, but are not duplicates.

jmarshall commented 6 years ago

Upon further testing, I see that AuthorName is also affected.

Whatever you carefully set up your Name and Email address to be, e.g., via bespoke user.name and user.email config entries appropriate for the particular repository, if your PR is merged via the GitHub UI “Squash and merge” button then in the resulting commit the AuthorName and AuthorEmail will be rewritten behind your back to your GitHub account's profile name and primary email address.

Stanzilla commented 5 years ago

Just noticed this as well, see this commit https://github.com/WeakAuras/WeakAuras2/commit/fa75abcc7b618b93d416aaa3f0a6315a2ab486d8.patch that was a Squash & Merge and it used InfusOnWoW as username, while this https://github.com/WeakAuras/WeakAuras2/commit/52de0a0de49afd9f665638227d2a003dd7267750.patch was a Rebase & Merge and it uses Infus as name.

jmarshall commented 5 years ago

Problem still exists: see PR jmarshall/test@02f15df7cb384ff726644fbc33c78f5c69225b4b which became jmarshall/test@2105ce75877dae22284af5e521ed4e55196411c5 on Squash'n'Merging. (Add .patch to those links to see a rawer view of the commits.)

jmarshall commented 5 years ago

Previously tweeted about here and here.

Followup email sent to support@github.com today, attempting to highlight the personal information disclosure angle. You would think the discarding users' carefully-constructed author information angle would be bad enough to gain attention in itself… — Git has one job — to preserve historical information accurately…

Could you please comment on the status of this bug report. It is a serious issue as it leads to misattribution of commits when a person uses their GitHub account for both e.g. work-related and personal activities.

Suppose I have my GitHub account set up with a profile name of "John Marshall" and a primary email address of my work email address, say john_marshall@workemail.example.

Suppose I contribute to some nethack-style game in the evenings, and set up my clone's repository config such that PRs are submitted with commits like

Author: John the Barbarian <john@gameplayers.example>

If the game maintainers merge this PR using the "squash & merge" button, the resulting commit on master will have its author information rewritten by GitHub to

Author: John Marshall <john_marshall@workemail.example>

This is wildly inappropriate and constitutes a doxxing of the user's work persona. (If several email addresses associated with the same GitHub account are on commits in different projects, to be sure it's not too hard to discover the other associated email addresses. However this bug reduces the effort required to zero and writes the work persona into the game repository's permanent commit history.)

Please comment as to whether GitHub accepts the seriousness of this issue, and what steps are being taken to fix it.

jmarshall commented 5 years ago

Reply received from GitHub Support on January 2nd:

Hey John,

My sincerest apologies for the delay in our response!

It looks like our Engineering Team are currently working on investigating the contribution attribution logic for Squash Merges. There isn't any concrete decisions taken place as yet but it does seem to be moving forward.

We appreciate your patience on this one and as soon as we have an update for you we'll be in touch.

Problem still exists unchanged on retesting today; my reply to Support, sent today:

Dear [Support],

Thank you for your reply. Could GitHub please comment on whether it considers this to be a serious and significant issue?

To my mind GitHub has been misidentifying these commits since September, and I would expect Engineering to want to revert September's logic change immediately (so as to stop adding more incorrectly-attributed commits to project histories) while they continue to work on further logic changes.

peterjc commented 5 years ago

To clarify, my down-thumbs reaction 👎 on John's update today was about the lack of any movement from GitHub.

I often do squash-and-merge commits of pull requests, but thus far haven't had anyone complain - people setting their commit authorship differently to their Github profile must be a minority? However, they would be a tech savvy minority ...

jmarshall commented 5 years ago

Reply received from Support today:

The Issue has been marked as Priority 1 and is currently being investigated by our Engineering Team.

Thanks (assuming 1 = high!). Insert here standard concern about “no response” and “bug report disappeared into the void” being indistinguishable…

GitHub has supported attaching multiple email addresses to each GH account for many years. This author rewriting goes directly against this, so once again it seems astonishing that the September behaviour was ever deployed…

negebauer commented 5 years ago

When editing a file from Github's UI you can actually select which email to use

image

But when merging a PR you can't, it should be an option too!

(Kinda related to this issue)

jmarshall commented 5 years ago

@negebauer: This issue is a history corruption bug that was introduced last September. The UI limitation you mention is issue #95 (see https://github.com/isaacs/github/issues/95#issuecomment-398203089 and following).

jmarshall commented 5 years ago

Another three months have passed with this at “Priority 1”, and the problem still exists unchanged on retesting today.

zephraph commented 5 years ago

I was just poking around and stumbled across this issue. At Artsy we're discussing moving to squash and merge by default. I definitely don't want to overwrite my coworker's contributions.

I haven't really noticed this issue in the past so I started testing it out.

TL;DR I believe this issue is solved.

Before I reached that conclusion, I started looking around for ways to fix this if it was an issue. I found the git trailer on-behalf-of on the github blog that looked promising. In their example it gives you the ability to commit on behalf of an organization, but I thought it might work for regular users too. I squash merged from the UI this pr (with the trailer in the commit message). It did say that I merged the branch...

image

When I looked at the commits in master I was greeted with this...

image

So GitHub is recognizing (at least from the UI) that my coworker authored it but I merged it. Great! I checked git to verify...

image

It's attributed to my coworker @mdole which is the desired behavior. Looks good! I wanted to verify that the trailer was what caused it to work. It could be that the issue is just fixed generally...

I found another PR that I could squash. Did it without the trailer this time.

image

Pretty much the same as above. I checked the commit log in master...

image

Looks like it worked! Given that, I think this issue is fixed and the on-behalf-of trailer isn't needed.

The git log of that last commit just to verify.

image

Caveat

There is a slight possibility that this functionality is in beta and not spread to all users. I have no indication to say that's the case, but the possibility exists. I'd love for someone else to try and let me know if it works.

jmarshall commented 5 years ago

@zephraph: In fact, your PRs demonstrate that this issue is still present.

The email address to which GitHub rewrites the Author is associated with the same GItHub user, so this bug does not affect the GitHub user to whom the commits are attributed in the UI. This is not at issue.

Matt Dole <mdole7@gmail.com> is the email address both in the first PR's commits and as it was merged to master, so the first PR does not obviously demonstrate this either way. More subtly, the AuthorDate has been updated in the merged commit (artsy/palette@e8109fd4e37d414f85f1a601337ed78d53b4569d), which does demonstrate this bug report.

Your second PR more clearly demonstrates this bug. The commit on the PR branch is:

commit e434d16671495ab01149230c1576e0edcd4ca0ff (pr2302)
Author:     Renovate Bot <bot@renovateapp.com>
AuthorDate: Wed Apr 17 02:37:43 2019 +0000
Commit:     Renovate Bot <bot@renovateapp.com>
CommitDate: Wed Apr 17 02:37:43 2019 +0000

but as merged to master it becomes

commit d0abee5ae248747f9b02ccebde124188842070a5 (HEAD -> master, origin/master, origin/HEAD)
Author:     renovate[bot] <renovate[bot]@users.noreply.github.com>
AuthorDate: Tue Apr 16 22:49:24 2019 -0400
Commit:     Justin Bennett <zephraph@gmail.com>
CommitDate: Tue Apr 16 22:49:24 2019 -0400

As you see, both the name and the email address of Renovate Bot <bot@renovateapp.com> have been rewritten by GitHub when you pressed the Squash'n'Merge button.

clarkbw commented 5 years ago

We have some updates coming, it might be useful to separate the date issue from the author in the future. This is a good reference for the problems but I’d like to track both separately. More to come soon.

zephraph commented 5 years ago

Ahhh, I see. Thanks for the clarification!

Sent with GitHawk

clarkbw commented 5 years ago

We have some updates coming

To clarify this a bit, we have updates coming related to the authoring part but nothing yet for what is happening with the dates.

zephraph commented 5 years ago

So was the on-behalf-of moderately useful?

Sent with GitHawk

WilliamMartinsson commented 5 years ago

This issue seems to have been resolved. When I was about to merge a PR I got presented with the following dropdown:

squash_and_merge

The git log looks good as well :)

aspiers commented 5 years ago

@jmarshall What's your take on the latest change? Is it fair to say that it resolves some aspects of this issue but not all? For example at a glance it doesn't seem to address date changes, but I haven't tried it myself. As @clarkbw suggested, maybe it makes sense to split them into separate issues.

jmarshall commented 5 years ago

@WilliamMartinsson: Nope. As previously noted, the choose-your-email dropdown added to the PR merge dialog is (the latter part of) #95. This issue is about what happens to Author and AuthorDate in the resulting commit merged to the main branch. If you did it on the command line, they would be unchanged (while Committer and CommitDate would be updated to reflect the person who altered the commits by rebasing) and this is what the GitHub UI appears to have done up until last September (see https://github.com/isaacs/github/issues/1368#issuecomment-427324115). So you would have to show us git log --format=fuller for the PR commit(s) and the merged commit to judge whether the behaviour is as per this issue.

So just now I have tested this once again, this time (for the first time) testing it as if I have composed a pull request containing another GitHub user's work:

PR jmarshall/test#22 contains one commit:

commit e7888595c3173508bc808857ecc0c236eb491952
Author:     On behalf of A. Colleague <colleague@example.org>
AuthorDate: Thu May 9 21:24:21 2019 +0100
Commit:     J. Marshall, esq <jm-nonprimary-email-one@example.org>
CommitDate: Thu May 9 21:24:21 2019 +0100

(The real colleague@example.org email address is the primary email address of another extant GitHub user.) This models a scenario in which I have rebased or otherwise amended my colleague's work and then (myself) created a PR out of the work. I then Squash'n'Merged the PR, selecting a different non-primary email address from the dropdown, resulting in the following commit on master:

commit 237523a06a17b0dac4f2a2a2f9192889e608188e
Author:     John Marshall <jm-nonprimary-email-two@example.org>
AuthorDate: Thu May 9 21:26:22 2019 +0100
Commit:     GitHub <noreply@github.com>
CommitDate: Thu May 9 21:26:22 2019 +0100

As you see, AuthorDate has been updated to be the same as CommitDate, which is different from what the Git command line would do and IMHO wrong.

As you see, the <…email-two@…> email address that I selected from the dropdown has been used and my name has been updated from J. Marshall, esq to my GitHub account's profile name. If I do committer actions from the GitHub UI, I would expect GitHub to use the form of my name that it knows to record the committer identity — so that #95 stuff is working appropriately IMHO.

As you also see, my committer identity has been shoved into Author, and my colleague's authorship has been removed entirely. This is absolutely astonishing and is an even worse behaviour than that in my original report back in September.

Unfortunately I haven't tested this with another GitHub user's email address previously, so I don't know whether this behaviour is new. @zephraph's second PR in https://github.com/isaacs/github/issues/1368#issuecomment-483928615 suggests that this behaviour is indeed new since April 17th, as in that case renovate-bot's authorship was retained in the commit as merged. (But it is hard to be sure, as in that case renovate-bot opened the PR that contained renovate-bot's commit, but in my test just now it was I who opened the PR containing my colleague's commit. It is conceivable that GitHub might be overriding Author with PR-opener rather than PR-merger. Either way it is IMHO the wrong thing to do, of course.)

In summary, there does appear to have been a recent change in behaviour and it appears to have made the authorship aspects of this issue much worse. It would be interesting to hear what GitHub's goals in altering Author at all are.

What I would expect to happen in this case is the same thing as happens with git rebase -i on the command line: Author and AuthorDate are untouched, and Commit/CommitDate reflect the changes to the commit in rebasing and squashing it/them:

commit 191d10423d12d33eadc36189d350183076b12cf1 
Author:     On behalf of A. Colleague <colleague@example.org>
AuthorDate: Thu May 9 21:24:21 2019 +0100
Commit:     John Marshall <jm-nonprimary-email-two@example.org>
CommitDate: Thu May 9 22:02:44 2019 +0100
benbalter commented 5 years ago

@jmarshall I really appreciate the in-depth breakdown here. Thanks for taking the time to share that. I've been researching the history of the behavior internally to try to determine the expected behavior (and why it is that way). I'm still waiting to hear back from a few colleagues, but I did want to share my understanding so far:

WilliamMartinsson commented 5 years ago

@jmarshall: Thank you for your extensive explaining, it was a great read!

I have to admit that I probably was so exited seeing progress on the issue described in #95 (which you kindly pointed out) and naively thought this feature would apply to this issue as well.

ericzolf commented 4 years ago

Any news on this issue? It's quite an issue that PR authors aren't kept in the final merge/squash commit message. As maintainer, and hence reviewer and merger, I do NOT want to get credits for work done by others.

harshitsaamu commented 4 years ago

the same issue is being discussed here #1303

mavenraven commented 4 years ago

Just got bit by this, the behavior was very unexpected!

keldin-coding commented 4 years ago

Indeed! Just did my first maintainer merge and accidentally exposed a user's email who did not want that email in the git log.

tvalentyn commented 3 years ago

When the PR author is not the actor, we want to capture both the author of the PR and the user who merged the pull request, so we set the author as the PR author and the committer as the actor for the merge commit.

For squash-and-merged PRs git log shows: Commit: GitHub <noreply@github.com> even if the PR author is not the actor who initiated the merge. It appears that GitHub stores who has merged the PR only in the GitHub PR metadata, but does not store it within git metadata. @benbalter could you please clarify the reason for this behavior? Thank you.

shehan-jay commented 3 years ago

Is the number of lines added as co-author added here?

Screenshot 2021-06-03 at 17 28 58
byjokese commented 3 years ago

I think gitlab here makes a great example. It squashes all the commits and then makes a merge of that unique commit. Allowing to view who was the contributor of the branch and the author who did the Pull Request. imagen