open-infrastructure-labs / ops-issues

0 stars 0 forks source link

Change merge policy #16

Open durandom opened 3 years ago

durandom commented 3 years ago

I propose to implement a review and merge process, similar to what we have in https://github.com/operate-first/ and discussed in https://github.com/operate-first/apps/issues/81

This would encompass:

I'm not sure though if 2 approvals and bot merge can work - that could be circumvented by granting merge permissions to the @open-infrastructure-labs/operations team

This would be applicable to

anishasthana commented 3 years ago

That's how we currently do things in a number of other repos -- https://github.com/AICoE/idh-manifests/ as an example. 2 approvals and bot merge can work together without issue.

larsks commented 3 years ago

Does @sesheta require "Write" or "Maintain" access to the repository?

goern commented 3 years ago

I question the 'two approvals' policy. I could imagine two roles: a reviewer (responsible for looking thru the PR making sure its good) and an582047410458 approver (basically depending on the reviewer and taking the accountability to merge the code)

If you envision a larger group of reviewers taking the majority of work, the approver is the one coordinating feature release, as it might require cross repo dependencies, or even constraints that are not documented as PR...

durandom commented 3 years ago

I'm also good with one approval and multiple reviewers. Do the bots already support this workflow? Otherwise we could start with 2 approvals and change later. One goal is to speed up the merge process

tumido commented 3 years ago

@goern How does the bots treat a GitHub review though?

For the context, I like to review via the GitHub reviews, because then my review can be re-requested etc/cleared whenever needed and I prefer it over the /approve and /lgtm commands since I want to see who approved what and how many people actually saw it - like our ADRs.

If I "approve" something via a Github review and I'm listed as both Prow "approver" and Prow "reviewer" in that repo, does it mean that I've approved the PR and it will be auto-merged since I'm an approver? Therfore it would act and merge it even though only a single person saw it?

At some (auto-deployed) repos, I want at least 2 people to see a PR. The GitHub approval setting to 2 approves physically blocks people to merge something with a single review. I like that. I rather ping people to get a second pair of eyes on a PR than allow this to be bypassed. I like this to be a hard requirement, so we really have people engaged in the review process, so more than 1 or 2 people really knows what actually is in that particular repo (bus factor > 2).

anishasthana commented 3 years ago

++ @tumido

goern commented 3 years ago

for https://prow.operate-first.cloud/ a GitHub approving review is equivalent to a /approve comment, see https://github.com/thoth-station/thoth-application/blob/master/prow/overlays/cnv-prod/plugins.yaml#L80-L87 and https://godoc.org/k8s.io/test-infra/prow/plugins#Approve So the difference of a Prow approver and a GitHub reviewer is the power distribution: OR vs AND

Prow (more specifically to LGTM and approve plugins) try to distinguish by role: if reviewers are happy, the approver should be so confident, that he can simply click the button. so the real hard review work will finish with an /lgtm Depending solely on Prow to do all the review/approve/merge work would enable us to have all the srcops configuration in prow's yaml files...

goern commented 3 years ago

.. had a short call with @tumido here is our proposal and goal. We would like to experiment with the proposal on the /support repo tomorrow...

Goal

help the majority fo the team to do good code reviews and take responsibility for high-quality code. GitHub UI and Prow /-commands should enable the same workflows for the same kind of members.

Proposal

  1. create a Devs team on GitHub that is able to do 'triage' on support/, see https://docs.github.com/en/github/setting-up-and-managing-organizations-and-teams/repository-permission-levels-for-an-organization
  2. sort the OWNERS file, so that @tumido and @HumairAK are the approvers, the reviewers are all the members of the Devs team
  3. BTW, approvers are responsible to help reviewers doing a good job, see Goal ;)
goern commented 3 years ago

BTW, https://github.com/operate-first/apps/pull/152 and https://github.com/AICoE/idh-manifests/pull/77 and good examples of non-obvious configurations of GitHub and Prow that look fancy...

tumido commented 3 years ago

Just to add a bit more meat to it. The proposal should allow us:

This should be rewritten as an ADR once we agree on the direction.

HumairAK commented 3 years ago

This definitely seems like the most appropriate way of doing things. I guess my biggest concern is PRs getting bottlenecked by the specific set of approvers (we're still in relatively early stages where we're trying to move a bit rapidly). Separating out approvers by location, should help with this I'd imagine. But let's try it out, and see how it goes.

HumairAK commented 3 years ago

@tumido With regards to:

The OWNERS file is context aware, so we can have a different set of approvers for different paths/folders within repository. Therefore, for example in the operate-first/apps repository, each application can have a different set of approvers.

Say if I have this scenario:

If a pr changes odh/monitoring and odh/kafka then do all x,y,a, and b become approvers ?

tumido commented 3 years ago

No idea and a good question! :smile: @goern do you have any experience on how that would work?

anishasthana commented 3 years ago

What actually prompted this conversation? Our workflow seems to have been working fairly fine so far :slightly_smiling_face: .

Personally I'm not a fan of the owners + reviewers approach for operations-focused repositories (I think it makes perfect sense for development-focused projects like Kubernetes or CRI-o or w/e). I feel like this would result in knowledge being concentrated on specific folks(in this case @tumido and @HumairAK ), no? Allowing any of the "core" team members to approve PRs would result in knowledge being spread out better. Having specific operate-first members be responsible for approving specific components will result in knowledge silos. Now, specialization isn't a bad thing, but I figured with ops teams we'd want to spread the knowledge to the extent that we aren't reliant on any one person for a given component. With the internal data hub team, we did run into this problem where we had one or two overarching approvers who understood the whole picture, but most team members only focused on their own bits. This meant that PTO and the like are very disruptive.

I might not be understanding things correctly either, but this feels like it also might be over-engineering for a problem we haven't run into yet.

tumido commented 3 years ago

@anishasthana I don't think this solution/issue implies a knowledge sharing limitations/issues. I think that we need to change our view on the reviewer and approver roles.

The way I see it is that since "everybody" would be able to review, they have to know that the content actually is and does. Or at least this will allow them to obtain such knowledge (good position for newcomers also). The knowledge is being shared and silos are prevented. Approvers can act only upon somebody else's review. Therefore there must be somebody else who's willing to sign that this code is OK.

The approver role in this case to me is more about the responsibility of merging a PR. The approver should be the same person who would be asked to fix things when the merge (and a consecutive auto deploy) would break things. When a PR is merged the person who merged it, should go and really see, in real deployment, if it really did the change the PR was advertising to do.

Currently I don't see each and every person who merges something do that. Heck I don't do it myself. I merge a operate-first website PRs and I don't really monitor if the build succeeded or not and if the change was propagated. You see.. I know how that repo works, how it's set up and many other things about it. But I'm not the one who actually takes care of it.

I don't see this to be an issue about concentrating knowledge but rather concentrating responsibility, which we kinda need...

HumairAK commented 3 years ago

@tumido I think that's a great point. Especially now that we have automated the actual merges, and deployments, there must be clear indication to whom the onus falls upon to verify the changes are working as intended. We do not currently have this, and I think we've fallen into a bad habit of "Okay I've approved this, sesheta will merge it, let me go do other things now".

We must make clear that if you are approving, you own the responsibility to verifying these changes. So that each PR is always confirmed working in the live system.

anishasthana commented 3 years ago

Ah that makes a lot of sense, I didn't think of it from a responsibility standpoint at all.

4n4nd commented 3 years ago

The approver should be the same person who would be asked to fix things when the merge (and a consecutive auto deploy) would break things.

I think people who approve PRs rn do know that this is their responsibility to fix things in case the PR breaks something. It just isn't written somewhere. So in a way we have "PR owners" instead of "app owners" we just need to explicitly write that down somewhere.

@tumido as we discussed, we have a small team spread over multiple time zones and sometimes it takes PRs longer to be reviewed because of that. Now when something breaks in our prod environment and we need to get a patch merged in immediately we would still need to wait for the app owner to approve the PR with this new policy and the app owner might be asleep in a different time zone.

anishasthana commented 3 years ago

Okay so from offline discussion a lot of the confusion has been cleared up. We all seemed to agree on the following:

OWNERS: Anand, Humair, Tom Reviewers: OWNERS + Anish, Hema, Harshad, Martin

PRs need 2 Reviews followed by 1 approval

@tumido @HumairAK @4n4nd please let me know if I'm misrepresenting anything :-)

HumairAK commented 3 years ago

To add, we will keep reviewers selection based on round robin algorithm (this is by default afaik), to encourage others to review PRs and keep SILOs from forming.

PRs need 2 Reviews followed by 1 approval

The approver can also be one of the 2 reviewers in this case (i.e. approver does /lgtm and /approve)

larsks commented 3 years ago

Reviewers: OWNERS + Anish, Hema, Harshad, Martin

Sorry, I may not be following this discussion completely. Shouldn't the open-infrastructure-labs/operations team all be reviewers (rather than individuals)?

HumairAK commented 3 years ago

I think the conversation started expanding beyond open-infra and in general terms. AFAIK you can't add teams in an OWNERS file. Seems like you can have OWNERS_ALIAS.

But regardless, I think for all repos (whether here, or in operate-first) that contain cluster resources we should be including infra/ops folks.