python / core-workflow

Issue tracker for CPython's workflow
https://mail.python.org/mailman/listinfo/core-workflow
Apache License 2.0
94 stars 59 forks source link

Generate Misc/ACKS from author data #7

Open brettcannon opened 7 years ago

brettcannon commented 7 years ago

As pulled from PEP 512:

Traditionally the Misc/ACKS file [20] has been managed by hand. But thanks to Git supporting an author value as well as a committer value per commit, authorship of a commit can be part of the history of the code itself.

As such, manual management of Misc/ACKS will become optional. A script will be written that will collect all author and committer names and merge them into Misc/ACKS with all of the names listed prior to the move to Git. Running this script will become part of the release process.

The script should also generate a list of all people who contributed since the last execution. This will allow having a list of those who contributed to a specific release so they can be explicitly thanked.

brettcannon commented 7 years ago

One issue with this is that if there are multiple collaborators or a committer edits a PR I believe that wipes out the author attribution in git. That might mean we need to have a script which goes through committed PRs to get the author information instead of using git for the information.

dhellmann commented 7 years ago

Git does separate committer and author. If there are 2 people involved, both ids are preserved. If there are multiple, then the extra names & ids need to be captured by hand as part of the commit. In OpenStack we handle that by including a "Co-Authored-By" field in the commit message.

brettcannon commented 7 years ago

Yep, I'm thinking of purely the authorship information as I'm not aware of any situation where you can have multiple committers for a single commit (we're only doing squash commits so there's no issue with a string of commits coming from a single PR).

dhellmann commented 7 years ago

I assume it would be better to implement this tool in Python, rather than say a shell script, to allow for more portable build tools?

brettcannon commented 7 years ago

Yes, Windows users are people too. ;)

It might actually not be worth querying git's author info if it won't work for multi-committer PRs. Since I don't expect any core dev who commits a patch is going to bother to manually fill in the git details for the author, accurate information will have to stem from either names manually entered into Misc/ACKS or as queried through the GitHub API for merged PRs.

dhellmann commented 7 years ago

They don't have to manually modify the commit metadata.

Say 2 people are working on a patch together. One of them will commit the change and add the other as Co-Authored-By. The first person will then be listed as the committer and author in the git metadata and the second will be in the commit message.

When the patch is merged directly, the metadata won't change. A merge commit may be created with committer id and author id set to the person doing the merge. If no merge commit is needed, then I think the metadata isn't modified at all (I know it's possible for this case to arise but I'm not sure how common it will be in the python-dev workflow). In either case, the metadata accurately reflects the committer and author, and the message includes the co-author.

If the patch is rebased before being merged, then the committer id definitely changes to the person doing the rebase, but the author stays the same in the metadata. So in that case the original author is listed in the metadata, the co-author is in the commit message, and the person doing the work to get the patch to merge is listed as the committer in the metadata.

So the script just needs to scan the history and take the author info, the committer info, and look for co-author info in the commit messages.

dhellmann commented 7 years ago

As far as portability goes, I've had good luck recently using dulwich as a library for reading git history data.

brettcannon commented 7 years ago

Sorry, bad wording on my part. I didn't mean edit metadata post-commit, I meant getting core devs to pass the appropriate flags upon commit to set the author information appropriately. That will be a change for some folks and so I don't know what kind of pushback there will be (historically it's just been a line like "Thanks to Doug Hellmann for the patch" in the commit message). But since I suspect people want credit on their GitHub account for the authorship we will just have to tell core devs they need to make sure to do this (although maybe I'm worrying way too much about this as I assume GitHub's instructions on pulling down a PR's branch also pulls author information and the multi-author case can just edit Misc/ACKS as appropriate).

dhellmann commented 7 years ago

Yeah, IIRC a pull request on github is a special reference within the git repository. So when you pull it down, you have an actual commit already tied into a part of the DAG. Approving the PR through the web UI or API merges it into the branch against which it was submitted. I don't think that usually involves any changes to the commit itself (that's what the merge commit is for).

I take it from your comment that the approval work flow for python-dev is still going to involve some sort of local copy of the patch, though? (Sorry, I did not follow that whole conversation.) If that's the case, it should still be possible to retain the metadata by using "git merge" locally (IIUC, that's more or less what github is doing). Even rebase and cherry-pick operations preserve that info.

The only case where I can see it being a definite issue is with a patch left on the bug tracker, if that patch is not exported from a full git commit (for example, if it's created with the patch command instead).

Looking at PEP 512, it looks like the commands to use to merge a pull request still need to be documented. Is that work going on somewhere else?

brettcannon commented 7 years ago

So it's complicated. 😉 Because we don't have a good Misc/NEWS solution I suspect most PRs will be accepted locally/manually, else Misc/NEWS will just become a merge conflict for everyone. But anything that doesn't need to touch Misc/NEWS can just be merged through the web UI on GitHub. Long-term, though, is that after Misc/NEWS is solved through #6 then more merges can be done through the web UI. After that it's solving the cherrypick issue into older versions of Python (once again, something to do manually). The hope is to make it so that all PRs can be merged through the web UI and have it cherrypicked into all relevant branches.

As for git instructions, https://cpython-devguide.readthedocs.io/en/github/ is the future version of the devguide post-migration and the commands are documented at https://cpython-devguide.readthedocs.io/en/github/committing.html#working-with-git

brettcannon commented 7 years ago

If we want any inspiration for this, there's https://thanks.rust-lang.org/ and http://contributors.rubyonrails.org/ (heck, we might be able to just take their code for equivalent sites if we want and then simply make sure there's a form of export or something so we can update Misc/ACKS for a release).

brettcannon commented 7 years ago

I realized another way to manage this is to detect when a PR comes in that isn't by a core dev (you can check for team membership easily as long as the bot is a member of the organization), find the "Patch by" line in the news entry (this does require #6 and to remove newlines and such to do a proper parsing job for e.g. multiple names), and verify that the name exists in Misc/ACKS. This would also act as a check to make sure the person is appropriately thanked in the news entry on top of making sure they are in the ACKS file.

serhiy-storchaka commented 7 years ago

The names in Misc/ACKS are sorted in roughly lexicographical order taking into account that there are different rules for different languages. For example Ã… is equivalent to A in some languages or follows Z in other languages. It is hard to do automatically.

brettcannon commented 7 years ago

@serhiy-storchaka I wouldn't add names to the file, I just want to verify they are at least listed there (at least in my latest idea).

brettcannon commented 7 years ago

You can get the "first-time contributor" badge info from the GraphQL API: https://github.com/blog/2397-making-it-easier-to-grow-communities-on-github