open-telemetry / community

OpenTelemetry community content
https://opentelemetry.io
Apache License 2.0
785 stars 237 forks source link

[Proposal] Use an infrastructure-as-code solution to manage the open-telemetry github #1596

Open jaronoff97 opened 1 year ago

jaronoff97 commented 1 year ago

Hello all, after some of the past week's challenges around repository maintenance, some operator contributors had the idea that we could use Github's official terraform provider to provision and manage the various SIG github groups, repositories, branch protection rules, etc. that the community repo is currently used to manage entirely manually. This would then allow the TC to just approve and merge PRs that change the governance of the open-telemetry github rather than needing to make a slew of manual changes.

Benefits

Risks

Design

Please let me know if there are any steps missing from this list.

Alternatives Considered

Overall, I think having Terraform to manage the OpenTelemetry github state would allow for much faster and reliable management for the TC and I'm excited to hear the rest of the community's thoughts.

yurishkuro commented 1 year ago

Big +1 for going to org-as-code, but I would like more concrete details in this proposal to understand the required effort to achieve that. Can you outline a solution better? "Using terraform" doesn't tell me much.

jaronoff97 commented 1 year ago

Sure! I was holding off on describing how this would all work if there was strong opposition. I can write up some details soon and update my issue with them.

trask commented 1 year ago

👍👍

can you look at / summarize other implementation options besides https://registry.terraform.io/providers/integrations/github/latest/docs, e.g. https://github.com/apps/settings, or if there are others worth considering?

jaronoff97 commented 1 year ago

@trask @yurishkuro I updated the description with your asks. Please let me know if you have other solutions in mind, or if there are other designs that may be more effective.

svrnm commented 1 year ago

I have some experience with the settings app that @trask suggested, it does a good job overall and it's very convenient for maintainers as they can do repo setting updates via a pull request (and if you combine it with CODEOWNERSHIP you can require TC-approval, etc.)

Another project that might be interesting to look into is Peribolos:

Peribolos allows the org settings, teams and memberships to be declared in a yaml file. GitHub is then updated to match the declared configuration.

There is also Peribolos as a Service:

If you ever wanted to manage your GitHub organization as code where everybody can simply open a PR and ask to create a team or make a repository, wait no more!

Credit for pointing me towards peribolos & peribolos as a service goes to my amazing colleague @lelia :-)

Edit: via https://docs.prow.k8s.io/docs/components/cli-tools/peribolos/:

Peribolos allows the org settings, teams and memberships to be declared in a yaml file. GitHub is then updated to match the declared configuration.

See the kubernetes/org repo, in particular the merge and update.sh parts of that repo for this tool in action.

Peribolos was the subject of a KubeCon talk: How Kubernetes Uses GitOps to Manage GitHub Communities at Scale

justaugustus commented 1 year ago

I've written some ideas up in the past on org management with tools like peribolos: https://github.com/todogroup/governance/issues/106#issuecomment-1000962064

lmilbaum commented 1 year ago

Terraform works with a backend to store its state files. You might want to consider how to set it up such that it is accessible by whoever needs to work with the Terraform plan.

jaronoff97 commented 1 year ago

@trask whatre the next steps to get started on this?

trask commented 1 year ago

since github administration is owned by the @open-telemetry/technical-committee, we'll need their guidance on how they would like to move forward with this

Aneurysm9 commented 1 year ago

Do we need to reconsider this approach in light of https://github.com/cncf/foundation/issues/617?

yurishkuro commented 1 year ago

@Aneurysm9 I don't think so, we can still use Hashicorp tools internally if they are not part of the artifacts we release.

jmacd commented 1 year ago

@jaronoff97 This topic came up (again) in today's technical committee meeting. We want to enable progress and unblock this effort, so that can begin treating github access permissions as code inside the organization.

EjiroLaurelD commented 1 year ago

Hello, my name is Laurel an Outreachy applicant. I went through the comments on this issue and found the proposal very intriguing. I have experience building with Terraform, and would love to contribute to this project in anyway I can. What are the next steps for Org-as-code and how can I be a part of it please? Thank you for your time

austinlparker commented 12 months ago

We should look at OpenTofu (https://opentofu.org/) for this in lieu of terraform. I think it's a good idea though, and the plan seems pretty straightforward.

austinlparker commented 12 months ago

In terms of CI/deployment runs, Spacelift offers a free plan that would probably work...

svrnm commented 12 months ago

I talked about this issue with @jaronoff97 a while ago, because I was looking into different alternatives to TF + github provider, i.e. there are

Compared to the Terraform Provider GitHub they all provide less functionality, but have some individual advantages, e.g. CLOWarden is cncf-owned (but still experimental) and Settings GitHub App "just" works by enabling it on a repository.

I wanted to call out those alternatives for completeness, but if the Terraform Provider for GitHub satisfies our needs, there is no strong objection from my site.

jaronoff97 commented 12 months ago

I'm happy with any of the above solutions, @svrnm should we attend the next TC meeting and walk through the options?

svrnm commented 12 months ago

I'm happy with any of the above solutions, @svrnm should we attend the next TC meeting and walk through the options?

I shared those alternatives to have them captured, but to me it looks like there is broad support for going with the TF + GH provider solution as you have outlined it initially. Based on @jmacd's comment ( https://github.com/open-telemetry/community/issues/1596#issuecomment-1757988125 ) I think everyone is happy if we proceed with what you proposed initially.

austinlparker commented 12 months ago

I've taken the liberty of putting together a spike on this so we can see what it'd look like.

svrnm commented 12 months ago

I've taken the liberty of putting together a spike on this so we can see what it'd look like.

Nice, will take a look

In terms of CI/deployment runs, Spacelift offers a free plan that would probably work...

There is also https://www.cncf.io/project-tools/, especially the cloud credits might be helpful here "That’s why CNCF has created the Cloud Credits program, focussed on the mutual success of projects and participating companies. To date, supporters like Google, AWS, Equinix, and GitHub have donated cloud credits"

bogdandrutu commented 12 months ago

Are we sure we want to use terraform? Are we ok with the new license?

alolita commented 12 months ago

OpenTofu not TF. +100 for IaC for OTEL GH management.

alolita commented 12 months ago

I support this initiative. Thanks for raising this @jaronoff97

austinlparker commented 12 months ago

The GC took a vote on this proposal and are unanimously in favor to continue work on it. Let's keep working on the PR!

trask commented 6 months ago

just documenting another option for completeness: https://github.com/github/safe-settings

austinlparker commented 4 months ago

I wanted to flag https://github.com/cncf/clowarden (this has been mentioned elsewhere, but probably good to keep it in this issue) as an alternative we should strongly consider, especially since we now have cloud credits for running our own infrastructure.

austinlparker commented 4 months ago

Honestly, the only thing CLOWarden doesn't do out of the box is handle 1password vaults (but there's no reason we couldn't add that, and I'm not really sure how easy it'd be to handle it thru OpenTofu anyway since we don't have a SSO provider; we'll need to do manual reconciliations, but I think that'd be straightforward enough to do thru a CLOWarden feature? it's not blocking for now either way.)

austinlparker commented 3 months ago

Wanted to summarize a discussion from the Maintainer's call on 7/15.

It was decided that SIGs should use this issue to discuss the scope of IaC management.

My position regarding centralizing repo/user/team membership in CLOWarden -

While I respect that it is, potentially, less easy to make a one-line PR to community than it is to click a button in the GitHub UI, I tend to believe the tradeoffs are worth it.

jaronoff97 commented 3 months ago

IMO I would really appreciate IAC management. It would really help us understand our current approvers/maintainers, their permissions, repos. Furthermore, it would make changes much more self-serve and auditable to avoid the need to bug GC/TC members. I also would eventually appreciate the abilities to provision and own different pieces of infrastructure.

While I respect that it is, potentially, less easy to make a one-line PR to community than it is to click a button in the GitHub UI, I tend to believe the tradeoffs are worth it.

I think the barrier being a PR isn't the end of the world given that they will have needed to make a PR prior to being a member or changing roles. We could also write some makefile automation to fill out issues for new users as well (automatically pulls the PRs they've made against otel repos).

svrnm commented 3 months ago

There were some strong opinions against team/repo management taking place through IaC

What kind of strong opinions against it? Can we get them shared here such that we can address them? +1 for what you both (@austinlparker + @jaronoff97) said o, there are many good reasons for doing it through IaC, which is not only an improvement from an audit and security but also community perspective (people see and recognize better who is filling which role, etc.). Indeed it should be more than "click a button", especially for maintainers, since we have a voting process that requires a PR already, all that we would do is move that PR somewhere else.

Maintainers did not feel adequately consulted/aware of this issue

Also here I would like to understand how we could have made people aware better, this is maybe more a thing for SIG Contributor Experience and might require it's own issue: to be honest I am not 100% sure what the right way is to "consult or make maintainers aware", is it a community issue?, is it the otel slack?, is it the SIG maintainers meeting? Maybe something we also need to be more conscious about with our community having reached the size we have today.

austinlparker commented 3 months ago

@svrnm I tried to capture those concerns in the issue above; fundamentally, some individuals raised the issue that their existing workflow for team management worked for them and didn't like this change, nor were they consulted on the proposal.

svrnm commented 3 months ago

@svrnm I tried to capture those concerns in the issue above; fundamentally, some individuals raised the issue that their existing workflow for team management worked for them and didn't like this change, nor were they consulted on the proposal.

Thanks for clarification.

mtwo commented 3 months ago

Summarizing the discussion from the maintainers' call today: the JS SIG wants this, Go also wants it but they would prefer to not be amongst the first wave of implementors