Closed ameukam closed 1 year ago
@ameukam: The label(s) sig/infra, priority/long-term
cannot be applied, because the repository doesn't have them.
/priority important-longterm
we currently use GKE for the build clusters and existing tooling is heavily built toward GCP. I would to investigate the possibility to use EKS clusters as build clusters. This should help balance the CI infrastructure between different providers.
This is awesome! Can't wait for it to come true!
- how we provide kubeconfigs to prow control plane
They are mounted to the Prow control plane pods, and the mounted path in the container was appended to the KUBECONFIG
env var in the deployment. For k8s prow the kubeconfig secrets were generated by gencred.
- how to we handle the logs.
Currently there are two phases of logs handling:
The streaming part should be fine on EKS as it would be the same kube API calls. The storage part can be done either via GCS or S3, I think both will work.
- how we handle authentication and authorisation for the service account.
For a prowjob pod, the only required permission is storage admin permission. There are 2 ways this is done via GCP:
For k8s prow the kubeconfig secrets were generated by gencred.
gencred
only support GCP Secret Manager at the moment. we should think about extend support to AWS Secret Manager API.
GCP workload identity binding a GCP service account with a k8s service account, and the prowjob pod using the k8s service account has the same permission as the binded GCP service account.
There is no notion of service account on AWS but we can still make a k8s service account assume a AWS IAM role similar to how Workload Identity works. So we use a S3 bucket to store the logs, we should be able to simply authorization on AWS.
For k8s prow the kubeconfig secrets were generated by gencred.
gencred
only support GCP Secret Manager at the moment. we should think about extend support to AWS Secret Manager API.
gencred
also supports dumping kubeconfig file locally. However I do agree that it's generally a good idea to backup these kubeconfigs in a vault.
GCP workload identity binding a GCP service account with a k8s service account, and the prowjob pod using the k8s service account has the same permission as the binded GCP service account.
There is no notion of service account on AWS but we can still make a k8s service account assume a AWS IAM role similar to how Workload Identity works. So we use a S3 bucket to store the logs, we should be able to simply authorization on AWS.
awesome!
Hi @chaodaiG @ameukam For the Prow test infrastructure setup on AWS EKS, regarding your questions:
(1) kubeconfig
can be obtained by aws eks update-kubeconfig
cmd, details can be found in https://docs.aws.amazon.com/cli/latest/reference/eks/update-kubeconfig.html
(2) IAM user & role access to the cluster is supported, you can add more roles/users to access the cluster. Details can be found in https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
(3) Authentication from OpenID Connect identity provider is supported, details can be found in https://docs.aws.amazon.com/eks/latest/userguide/authenticate-oidc-identity-provider.html
(4) There's also ingress support for oidc idp, the annotation is alb.ingress.kubernetes.io/auth-idp-oidc
, details can be found in: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/annotations/#auth-idp-oidc
(5) Log access to s3 is supported by alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=my-access-log-bucket,access_logs.s3.prefix=my-app
, details can be found in https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/annotations/
Using EKS as a build cluster shouldn't itself be a big issue. It should be fairly straightforward to use this for things like build and unit test, given sufficient resources. EKS is already running multiple prow instances.
Using EKS to take over many of the jobs may be, because they need to interact with boskos + GCP projects, failing a maintained OSS solution for running e2e clusters @ HEAD on another platform and also translating the equivilant job config.
This should be straightforward.
All we need to do is:
@BobyMCbobs
can do it). an example https://github.com/cncf-infra/aws-infra/pull/15I'll submit a PR that amends our images today.
This is a typical EKS cluster context. Unlike the gke kubeconfig there is nothing sensitive in there.
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1URXpNREV6TkRVMU9Wb1hEVE15TVRFeU56RXpORFUxT1Zvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBSzJUCnZGNkI1NEkyS3R3QlRjY3AxQkZBeHkzdDRJZzY5aUFUdCtQaEtxSFEzMWpYZHFQUldGT1FjWWVHY2lCRnE3Z04KeFNkdjZDZE5md1hXUTBCc3FzNzZKUEsza1U5RkI4WjcybHZMdUlhYThlN3M3clFqS1ZZNzFRU2cxMURISnduMApnaTNrd05KV2I1SGhoRmRoZlovYmZlUlI2aS84elB1UGFUd25BS1FpWE02Vm9LRzZ5N1lSUGZOc1g3TjlQeUYyCllpMDRvMDh5QUkyZkpCcFREK0VPem5UaWErdSsvdk5KdUdFQzEzUE5LbDlia1hWT2J4UXlpVlpkd0RnNkU3czQKT0hRMHBwNkRKZW9neXhUeHYwUGI4Yy9wNmdIVGEwWnU1SENFdWpSYjRLbm9Jb2NpUXYvdXRKR3JxekhIUHQvdwpzOUpJbEQzNDNONEM5TUpqaytjQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZHRDhuMThLL0h4MFFYajVScktDb0l5dkZJVVVNQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBQmthTnh2T2RsVEtTVk5lYTVGdwp0VjNYdjJnbGN5bUZCRTVybnBkVzJHam13a09TaThHVWV2SHEwam5XNzJoTktiUVYzYURoOUJSY3R6NFFOQ1E2CkNyME1ub2tRazRPZmVJZ1FVdVhwUXlGbTJzRmVxSUhhWnpveGViQ0JlNUlXN0gzZ3piMnJLbGc4ZjhDTGFVY3oKMW01NTJZL1V0SmlKTHBNeUR0NUYwNG9DY0hQdDZsL1JQK2lKYVg2SGRTMGdBR25UWlZmQitsNG5CaGNSMkYzUwpDVkNCSmxTVEEyVlo4Y0tQSzZ1eEs5YUpXM3FnMGxlS0hUdXJNSUxyQVIwVG4wVWt4OElrOEl1ajA1d2s2cTZNCkhaN2NFTnNaNlBJYXBRVHlaaW1Ob3NLSEJJelpONEcxUmI1cjZnVHpzcjhwbXRjZlZodGxzOEJTQ01razZkWWgKdjlrPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
server: https://23158172FE52707B6EBB0D95C89842DD.gr7.eu-west-2.eks.amazonaws.com
name: dev.eu-west-2.eksctl.io
contexts:
- context:
cluster: dev.eu-west-2.eksctl.io
user: dev.eu-west-2.eksctl.io
name: dev.eu-west-2.eksctl.io
current-context: dev.eu-west-2.eksctl.io
kind: Config
preferences: {}
users:
- name: dev.eu-west-2.eksctl.io
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- token
- -i
- dev
command: aws-iam-authenticator
env:
- name: AWS_STS_REGIONAL_ENDPOINTS
value: regional
- name: AWS_DEFAULT_REGION
value: eu-west-2
provideClusterInfo: false
how we provide kubeconfigs to prow control plane
A couple of tips
Where will we put the code to define what runs on top of EKS (eg, a cluster autoscaler)?
Where will we put the code to define what runs on top of EKS (eg, a cluster autoscaler)?
Persistent clusters and other basic infra configs are usually in github.com/kubernetes/k8s.io
Successful integration between an EKS cluster and prow.k8s.io
Canary job: https://prow.k8s.io/?job=pull-release-test-canary&cluster=eks-prow-build-cluster Config of the build cluster: https://github.com/kubernetes/k8s.io/tree/main/infra/aws/terraform/prow-build-cluster. History of the rollout: https://kubernetes.slack.com/archives/C7J9RP96G/p1678308055091519
/close
@ameukam: Closing this issue.
/meow
@dims:
/woof
/meow
@qct:
we currently use GKE for the build clusters and existing tooling is heavily built toward GCP. I would to investigate the possibility to use EKS clusters as build clusters. This should help balance the CI infrastructure between different providers.
Some questions come in mind:
Update(01/20/2023):
Part of:
/sig infra testing /area prow /priority long-term
/assign @chaodaiG cc @cjwagner @spiffxp