kubernetes / k8s.io

Code and configuration to manage Kubernetes project infrastructure, including various *.k8s.io sites
https://git.k8s.io/community/sig-k8s-infra
Apache License 2.0
733 stars 814 forks source link

[Umbrella Issue] Build GKE Cluster for running bots, utilities #152

Closed dims closed 4 years ago

dims commented 5 years ago

[1] https://cloud.google.com/iam/docs/understanding-roles#kubernetes_engine_roles

dims commented 5 years ago

cc @sttts @justinsb @thockin

justinsb commented 5 years ago

I think this is a great list, balancing starting on the journey with where we want to go.

If I could express an end-state philosophy, I think it is this:

I think we should proceed with the 30 day cluster and iterate towards this goal. My only concern is that it's a little tricky if I'm in all three roles (using myself as example) of cluster admin, project admin, developer: I want to verify that the experience as a developer is reasonable, but that's difficult if I happen to have cluster-admin rights. I really like your suggestion of using roles though - I think we could give the core group IAM admin rights, and then they could add and remove their own privileges as needed. (So we would "sudo" by granting ourselves GKE Admin, and then remove that role when we're done) Unless anyone knows a better way?

thockin commented 5 years ago

The way I tested was by having a distinct gmail account that I granted and removed privs from.

On Thu, Nov 8, 2018 at 7:34 AM Justin Santa Barbara < notifications@github.com> wrote:

I think this is a great list, balancing starting on the journey with where we want to go.

If I could express an end-state philosophy, I think it is this:

  • Some group of people have to seed the system, and act as backstop in case of problems, but those credentials/permissions should should not be used day-to-day.
  • We want all provisioning of the cluster(s) to be automated and driven by a public git repo.
  • We want all the jobs to be provisioned automatically, driven by a git repo.
  • We want job developers to be effective (e.g. logs, see pods and events?), but we otherwise want least privilege.
  • Ideally we would want to run both high-trust (sign artifacts) and low-trust (run CI) jobs on the same cluster. This might not be possible / practical / a good idea, but we would prefer not to have one-cluster-per-job.

I think we should proceed with the 30 day cluster and iterate towards this goal. My only concern is that it's a little tricky if I'm in all three roles (using myself as example) of cluster admin, project admin, developer: I want to verify that the experience as a developer is reasonable, but that's difficult if I happen to have cluster-admin rights. I really like your suggestion of using roles though - I think we could give the core group IAM admin rights, and then they could add and remove their own privileges as needed. (So we would "sudo" by granting ourselves GKE Admin, and then remove that role when we're done) Unless anyone knows a better way?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/k8s.io/issues/152#issuecomment-437038608, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVC0ZUK7cUVaBf9zlnBhQJGUkxdQXks5utE8IgaJpZM4YUvF4 .

dims commented 5 years ago

@thockin i can turn what we have here into a PR (README in cluster/ directory in this repo).

Is this enough for us to get started with a 30 day cluster?

thockin commented 5 years ago

Yes. The AI today was to create the ggroup, which we called out in this issue already - does that group exist?

We should endeavor to do better than owner/viewer split - a ggroup per app or something, plus overall admins group.

dims commented 5 years ago

Tim, k8s-infra-cluster-admins is the name of the google group. @justinsb has created that group and sent invites.

/assign @thockin

thockin commented 5 years ago

The group has cluster admin access

dims commented 5 years ago

i have been able to access the cluster and run the publishing-bot

dims commented 5 years ago

/assign @thockin /unassign

spiffxp commented 5 years ago

Iterating here at the moment; tldr moving to use terraform instead of bash https://github.com/kubernetes/k8s.io/issues/243#issuecomment-510118579

spiffxp commented 5 years ago

@thockin would like to take one last look at terraform and then we move forward

spiffxp commented 5 years ago

burn down cluster, rebuild using terraform https://github.com/kubernetes/k8s.io/issues/243

enumerate followup tasks like monitoring here

thockin commented 5 years ago

Update: we have a cluster "aaa", that comes from terraform. We have not moved everything over yet, so I'll propose that the goalpost for this bug be:

1) EOL the "development2" cluster (publisher bot) 2) Move all of the things in the google "utilicluster" into "aaa"

For this context, "move" means "with minimal but non-zero monitoring and docs".

At that point, this mission is done and new missions can be formulated.

thockin commented 4 years ago

I have opened issues for each specific item. "aaa" is live and gcsweb has moved.