palantir / policy-bot

A GitHub App that enforces approval policies on pull requests
Apache License 2.0
766 stars 104 forks source link

Allow branch updates/merges from target without invalidation #12

Closed jmcampanini closed 5 years ago

jmcampanini commented 6 years ago

This makes it possible to keep branches up-to-date without (1) requiring reapproval and (2) counting the person who clicked merge as a contributor.

Thoughts on the implementation from the internal issue:

I'm not sure how to detect this... maybe by allowing merge commits that don't themselves add any new changes or files to the branch?

In terms of implementation, there are two parts to consider: (1) how to identify a safe/clean commit and (2) how to ignore those.

The second is easiest: take a look at the InvalidateOnPush handling in IsApproved. You can walk back the lastCommitOrder until the first unsafe commit. You'll also need to adjust the logic for finding contributors to ignore the authors of safe commits; that's here.

For identifying commits, a good starting point is finding merge commits. This should be easy as they'll have more than one parent. Unfortunately, merge commits can also modify code, as when conflicts are resolved. I think you could do this by looking up the tree associated with the commit and seeing if it is empty, but I'm not quite sure how this is represented in Git.

We'll also have to decide if we only want to allow commits created by GitHub (or cleverly faked by users) or if local merges can count as well.

The last part is integrating that into the GitHub implementation of pull.Context, which wraps all the API logic and handles caching for the duration of a request. Try to minimize API calls, but correctness is more important than efficiency.

asanderson15 commented 5 years ago

+1 for having this - would be very valuable for Rubix.

bluekeyes commented 5 years ago

I've started researching how to implement this and I think it will be hard to add a perfectly secure option, i.e. one that only allows merge commits with no user modified code. This is because merge commits usually have unique trees and the trees contain new blobs for files that were modified in both parents. I haven't found a way to distinguish automatically resolved files from manually resolved files, and I suspect it isn't possible without requiring additional metadata. It's unclear how GitHub Reviews decide if a merge commit counts as an update, but they probably have access to data not exposed in the API.

There's also a second issue: performing tree comparisons potentially requires a lot of GitHub API calls (up to one for each directory in a repository for each parent commit.) Ideally we could avoid comparing trees to support this.

Proposal

Add an ignore_update_merges option to the options structure for approval rules. If this option is true, policy-bot will ignore for the purpose of approval any commit X where all of the following is true:

  1. Has two parents
  2. The committedViaWeb property is true
  3. One parent is in the last N commits of the target branch

This should ignore any merges committed by clicking the "Update Branch" button in the UI. It won't be able to tell if the merge resolved a conflict, so there's a possibility to add code without approval by resolving a merge conflict in the UI editor.

It will not allow local merges that update the branch, but this seems acceptable. We'll may also have to make sure it can ignore merge commits created by Bulldozer using the update feature.

N would be set to something like 100 (the largest allowed page size), which means commits that merge in old versions of the target will not be ignored. I think this is fine, since I believe people mostly want this for PRs that have been approved but need one more update before they can merge due to the required up-to-date check in GitHub.

bluekeyes commented 5 years ago

Looks like Bulldozer merges count as "web" commits (I guess because it uses the API), so we should be set there:

{
  "commit": {
    "author": {
      "name": "bulldozer[bot]",
      "email": "bulldozer[bot]@users.noreply.github.domain",
      "user": {
        "login": "bulldozer[bot]"
      }
    },
    "committer": {
      "name": "GitHub Enterprise",
      "email": "noreply@github.domain",
      "user": null
    },
    "committedViaWeb": true
  }
}