kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.83k stars 2.65k forks source link

Investigate usage of EKS for test jobs #27896

Closed ameukam closed 1 year ago

ameukam commented 1 year ago

we currently use GKE for the build clusters and existing tooling is heavily built toward GCP. I would to investigate the possibility to use EKS clusters as build clusters. This should help balance the CI infrastructure between different providers.

Some questions come in mind:

Update(01/20/2023):

Part of:

/sig infra testing /area prow /priority long-term

/assign @chaodaiG cc @cjwagner @spiffxp

k8s-ci-robot commented 1 year ago

@ameukam: The label(s) sig/infra, priority/long-term cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kubernetes/test-infra/issues/27896): >we currently use GKE for the build clusters and existing tooling is heavily build toward GCP. I would to investigate the possibility to use EKS clusters as build clusters. >This should help balance the CI infrastructure between different providers. > >Some questions come in mind: >- how we provide kubeconfigs to prow control plane >- how to we handle the logs. >- how we handle authentication and authorisation for the service account. > > >/sig infra testing >/area prow >/priority long-term > >/assign @chaodaiG >cc @cjwagner @spiffxp Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ameukam commented 1 year ago

/priority important-longterm

chaodaiG commented 1 year ago

we currently use GKE for the build clusters and existing tooling is heavily built toward GCP. I would to investigate the possibility to use EKS clusters as build clusters. This should help balance the CI infrastructure between different providers.

This is awesome! Can't wait for it to come true!

  • how we provide kubeconfigs to prow control plane

They are mounted to the Prow control plane pods, and the mounted path in the container was appended to the KUBECONFIG env var in the deployment. For k8s prow the kubeconfig secrets were generated by gencred.

  • how to we handle the logs.

Currently there are two phases of logs handling:

The streaming part should be fine on EKS as it would be the same kube API calls. The storage part can be done either via GCS or S3, I think both will work.

  • how we handle authentication and authorisation for the service account.

For a prowjob pod, the only required permission is storage admin permission. There are 2 ways this is done via GCP:

ameukam commented 1 year ago

For k8s prow the kubeconfig secrets were generated by gencred.

gencred only support GCP Secret Manager at the moment. we should think about extend support to AWS Secret Manager API.

GCP workload identity binding a GCP service account with a k8s service account, and the prowjob pod using the k8s service account has the same permission as the binded GCP service account.

There is no notion of service account on AWS but we can still make a k8s service account assume a AWS IAM role similar to how Workload Identity works. So we use a S3 bucket to store the logs, we should be able to simply authorization on AWS.

chaodaiG commented 1 year ago

For k8s prow the kubeconfig secrets were generated by gencred.

gencred only support GCP Secret Manager at the moment. we should think about extend support to AWS Secret Manager API.

gencred also supports dumping kubeconfig file locally. However I do agree that it's generally a good idea to backup these kubeconfigs in a vault.

GCP workload identity binding a GCP service account with a k8s service account, and the prowjob pod using the k8s service account has the same permission as the binded GCP service account.

There is no notion of service account on AWS but we can still make a k8s service account assume a AWS IAM role similar to how Workload Identity works. So we use a S3 bucket to store the logs, we should be able to simply authorization on AWS.

awesome!

xdu31 commented 1 year ago

Hi @chaodaiG @ameukam For the Prow test infrastructure setup on AWS EKS, regarding your questions: (1) kubeconfig can be obtained by aws eks update-kubeconfig cmd, details can be found in https://docs.aws.amazon.com/cli/latest/reference/eks/update-kubeconfig.html (2) IAM user & role access to the cluster is supported, you can add more roles/users to access the cluster. Details can be found in https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html (3) Authentication from OpenID Connect identity provider is supported, details can be found in https://docs.aws.amazon.com/eks/latest/userguide/authenticate-oidc-identity-provider.html (4) There's also ingress support for oidc idp, the annotation is alb.ingress.kubernetes.io/auth-idp-oidc, details can be found in: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/annotations/#auth-idp-oidc (5) Log access to s3 is supported by alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=my-access-log-bucket,access_logs.s3.prefix=my-app , details can be found in https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/annotations/

BenTheElder commented 1 year ago

Using EKS as a build cluster shouldn't itself be a big issue. It should be fairly straightforward to use this for things like build and unit test, given sufficient resources. EKS is already running multiple prow instances.

Using EKS to take over many of the jobs may be, because they need to interact with boskos + GCP projects, failing a maintained OSS solution for running e2e clusters @ HEAD on another platform and also translating the equivilant job config.

upodroid commented 1 year ago

This should be straightforward.

All we need to do is:

I'll submit a PR that amends our images today.

This is a typical EKS cluster context. Unlike the gke kubeconfig there is nothing sensitive in there.

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1URXpNREV6TkRVMU9Wb1hEVE15TVRFeU56RXpORFUxT1Zvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBSzJUCnZGNkI1NEkyS3R3QlRjY3AxQkZBeHkzdDRJZzY5aUFUdCtQaEtxSFEzMWpYZHFQUldGT1FjWWVHY2lCRnE3Z04KeFNkdjZDZE5md1hXUTBCc3FzNzZKUEsza1U5RkI4WjcybHZMdUlhYThlN3M3clFqS1ZZNzFRU2cxMURISnduMApnaTNrd05KV2I1SGhoRmRoZlovYmZlUlI2aS84elB1UGFUd25BS1FpWE02Vm9LRzZ5N1lSUGZOc1g3TjlQeUYyCllpMDRvMDh5QUkyZkpCcFREK0VPem5UaWErdSsvdk5KdUdFQzEzUE5LbDlia1hWT2J4UXlpVlpkd0RnNkU3czQKT0hRMHBwNkRKZW9neXhUeHYwUGI4Yy9wNmdIVGEwWnU1SENFdWpSYjRLbm9Jb2NpUXYvdXRKR3JxekhIUHQvdwpzOUpJbEQzNDNONEM5TUpqaytjQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZHRDhuMThLL0h4MFFYajVScktDb0l5dkZJVVVNQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBQmthTnh2T2RsVEtTVk5lYTVGdwp0VjNYdjJnbGN5bUZCRTVybnBkVzJHam13a09TaThHVWV2SHEwam5XNzJoTktiUVYzYURoOUJSY3R6NFFOQ1E2CkNyME1ub2tRazRPZmVJZ1FVdVhwUXlGbTJzRmVxSUhhWnpveGViQ0JlNUlXN0gzZ3piMnJLbGc4ZjhDTGFVY3oKMW01NTJZL1V0SmlKTHBNeUR0NUYwNG9DY0hQdDZsL1JQK2lKYVg2SGRTMGdBR25UWlZmQitsNG5CaGNSMkYzUwpDVkNCSmxTVEEyVlo4Y0tQSzZ1eEs5YUpXM3FnMGxlS0hUdXJNSUxyQVIwVG4wVWt4OElrOEl1ajA1d2s2cTZNCkhaN2NFTnNaNlBJYXBRVHlaaW1Ob3NLSEJJelpONEcxUmI1cjZnVHpzcjhwbXRjZlZodGxzOEJTQ01razZkWWgKdjlrPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    server: https://23158172FE52707B6EBB0D95C89842DD.gr7.eu-west-2.eks.amazonaws.com
  name: dev.eu-west-2.eksctl.io
contexts:
- context:
    cluster: dev.eu-west-2.eksctl.io
    user: dev.eu-west-2.eksctl.io
  name: dev.eu-west-2.eksctl.io
current-context: dev.eu-west-2.eksctl.io
kind: Config
preferences: {}
users:
- name: dev.eu-west-2.eksctl.io
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - token
      - -i
      - dev
      command: aws-iam-authenticator
      env:
      - name: AWS_STS_REGIONAL_ENDPOINTS
        value: regional
      - name: AWS_DEFAULT_REGION
        value: eu-west-2
      provideClusterInfo: false
sftim commented 1 year ago

how we provide kubeconfigs to prow control plane

A couple of tips

sftim commented 1 year ago

Where will we put the code to define what runs on top of EKS (eg, a cluster autoscaler)?

BenTheElder commented 1 year ago

Where will we put the code to define what runs on top of EKS (eg, a cluster autoscaler)?

Persistent clusters and other basic infra configs are usually in github.com/kubernetes/k8s.io

ameukam commented 1 year ago

Successful integration between an EKS cluster and prow.k8s.io

Canary job: https://prow.k8s.io/?job=pull-release-test-canary&cluster=eks-prow-build-cluster Config of the build cluster: https://github.com/kubernetes/k8s.io/tree/main/infra/aws/terraform/prow-build-cluster. History of the rollout: https://kubernetes.slack.com/archives/C7J9RP96G/p1678308055091519

/close

k8s-ci-robot commented 1 year ago

@ameukam: Closing this issue.

In response to [this](https://github.com/kubernetes/test-infra/issues/27896#issuecomment-1464384497): >Successful integration between an EKS cluster and `prow.k8s.io` > >Canary job: https://prow.k8s.io/?job=pull-release-test-canary&cluster=eks-prow-build-cluster >Config of the build cluster: https://github.com/kubernetes/k8s.io/tree/main/infra/aws/terraform/prow-build-cluster. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
dims commented 1 year ago

/meow

k8s-ci-robot commented 1 year ago

@dims: cat image

In response to [this](https://github.com/kubernetes/test-infra/issues/27896#issuecomment-1464447765): >/meow Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
BenTheElder commented 1 year ago

/woof

k8s-ci-robot commented 1 year ago

@BenTheElder: dog image

In response to [this](https://github.com/kubernetes/test-infra/issues/27896#issuecomment-1464449853): >/woof Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
qct commented 1 year ago

/meow

k8s-ci-robot commented 1 year ago

@qct: cat image

In response to [this](https://github.com/kubernetes/test-infra/issues/27896#issuecomment-1579080207): >/meow Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.