Enabling IRSA on self hosted k8s cluster with kops runs into a problem with cilium pods

milan-stikic-cif commented 2 years ago

/kind bug

1. What kops version are you running? The command kops version, will display this information. Version 1.23.2 2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Version 1.21.5 3. What cloud provider are you using? Self Hosted K8S cluster on AWS 4. What commands did you run? What is the simplest way to reproduce this issue? Enabling serviceAccountIssuerDiscovery with enableAWSOIDCProvider: true and setting a bucket for JWKS, using kops rolling-update 5. What happened after the commands executed? kops decided that 3 master nodes (3 is total) need to get updated. Once it started updating first master node, it is stuck because cilium pod running on that master is getting following errors:

level=info msg="Initializing daemon" subsys=daemon
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://internal.api.address:443" subsys=k8s
level=error msg="Unable to contact k8s api-server" error=Unauthorized ipAddr="https://internal.api.address:443" subsys=k8s
level=fatal msg="Unable to initialize Kubernetes subsystem" error="unable to create k8s client: unable to create k8s client: Unauthorized" subsys=daemon

This will eventually make other cilium pods go into CrashLoop with following message :

level=fatal msg="Unable to initialize Kubernetes subsystem" error="the server has asked for the client to provide credentials" subsys=daemon

In the end the whole cluster becomes unusable as all pods stop working at some point 6. What did you expect to happen? That the cluster will start normally after kops rolling-update, and start using new serviceAccountIssuer using OIDC for enabling IAM Roles for Service Accounts 7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2021-10-11T16:04:09Z"
  generation: 17
  name: REDACTED
spec:
  api:
    loadBalancer:
      class: Network
      type: Internal
  authorization:
    rbac: {}
  certManager:
    enabled: true
    managed: false
  channel: stable
  cloudProvider: aws
  configBase: REDACTED
  containerRuntime: containerd
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-aws-region-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-aws-region-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-aws-region-1c
      name: c
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-aws-region-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-aws-region-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-aws-region-1c
      name: c
    memoryRequest: 100Mi
    name: events
  externalPolicies:
    master:
    - REDACTED
    - REDACTED
    node:
    - REDACTED
    - REDACTED
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.21.5
  masterInternalName: REDACTED
  masterPublicName: REDACTED
  metricsServer:
    enabled: true
    insecure: false
  networkCIDR: 172.20.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  podIdentityWebhook:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: REDACTED
    enableAWSOIDCProvider: true
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: aws-region-1a
    type: Private
    zone: aws-region-1a
  - cidr: 172.20.64.0/19
    name: aws-region-1b
    type: Private
    zone: aws-region-1b
  - cidr: 172.20.96.0/19
    name: aws-region-1c
    type: Private
    zone: aws-region-1c
  - cidr: 172.20.0.0/22
    name: utility-aws-region-1a
    type: Utility
    zone: aws-region-1a
  - cidr: 172.20.4.0/22
    name: utility-aws-region-1b
    type: Utility
    zone: aws-region-1b
  - cidr: 172.20.8.0/22
    name: utility-aws-region-1c
    type: Utility
    zone: aws-region-1c
  topology:
    bastion:
      bastionPublicName: REDACTED
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-10-11T16:04:10Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: bastions
spec:
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211001
  machineType: t3.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: bastions
  role: Bastion
  subnets:
  - aws-region-1a
  - aws-region-1b
  - aws-region-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-10-11T16:04:09Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: master-aws-region-1a
spec:
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211001
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-aws-region-1a
  role: Master
  subnets:
  - aws-region-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-10-11T16:04:09Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: master-aws-region-1b
spec:
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211001
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-aws-region-1b
  role: Master
  subnets:
  - aws-region-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-10-11T16:04:10Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: master-aws-region-1c
spec:
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211001
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-aws-region-1c
  role: Master
  subnets:
  - aws-region-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-10-11T16:04:10Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: nodes-aws-region-1a
spec:
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211001
  machineType: t3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-aws-region-1a
  role: Node
  subnets:
  - aws-region-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-10-11T16:04:10Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: nodes-aws-region-1b
spec:
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211001
  machineType: t3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-aws-region-1b
  role: Node
  subnets:
  - aws-region-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-10-11T16:04:10Z"
  labels:
    kops.k8s.io/cluster: REDACTED
  name: nodes-aws-region-1c
spec:
  associatePublicIp: false
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211001
  machineType: t3.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-aws-region-1c
  role: Node
  subnets:
  - aws-region-1c

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

kubelet logs are following:

E0828 20:35:00.314637    6170 server.go:273] "Unable to authenticate the request due to an error" err="invalid bearer token"
E0828 20:35:01.617126    6170 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cilium-agent\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cilium-agent pod=cilium-v8h2f_kube-system(bb885613-e4df-4bcb-9d83-7c22cb91d446)\"" pod="kube-system/cilium-v8h2f" podUID=bb885613-e4df-4bcb-9d83-7c22cb91d446

, and alsom

I0828 21:01:39.695536    6170 prober.go:116] "Probe failed" probeType="Startup" pod="kube-system/cilium-xl2nv" podUID=5b8ad1c3-a715-4670-82e0-550585372083 containerName="cilium-agent" probeResult=failure output="Get \"http://127.0.0.1:9876/healthz\": dial tcp 127.0.0.1:9876: connect: connection refused"

Other cilium pods (ones that were not on updated master node), eventually get following error messages:

level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=warning msg="Network status error received, restarting client connections" error=Unauthorized subsys=k8s
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/synced/crd.go:131: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized" subsys=k8s
level=info msg="Exiting due to signal" signal=terminated subsys=daemon
level=fatal msg="Error while creating daemon" error="context canceled" subsys=daemon
level=info msg="Waiting for all endpoints' go routines to be stopped." subsys=daemon
level=info msg="All endpoints' goroutines stopped." subsys=daemon

9. Anything else do we need to know? I was following https://dev.to/olemarkus/irsa-support-for-kops-1doe and https://dev.to/olemarkus/zero-configuration-irsa-on-kops-1po1 for enabling IRSA for self hosted k8s clusters. One difference is that we have cert-manager installed prior to trying to enable this. That is why spec.certManager has managed: false config. Also, kube-apiserver on updated master node never gets deployed in mean time.

olemarkus commented 2 years ago

Unfortunately enabling IRSA on a live cluster is disruptive. I will make sure to mention this in our docs.

You need to delete any SA tokens and restart your Pods.

milan-stikic-cif commented 2 years ago

Hey @olemarkus thanks for the reply! We even tried deleting SA tokens but only that of cilium pods. Should we try recreating all SAs and Pods after enabling IRSA ?

olemarkus commented 2 years ago

Only where you are getting authorization errors should be sufficient.

milan-stikic-cif commented 2 years ago

Hi @olemarkus , unfortunately restarting pods and deleting SA tokens didn't solve our problem. cilium pods and cilium-operator pods are deployed as daemon set and deployment respectively. I tried deleting their service accounts along with sa tokens and then redeploying daemon set and deployment, but it seems that pods that end up on the updated master instance still cannot authenticate to api server. (making them go to CrashLoopBack with all the same errors i posted above) As our cluster grows, need for IRSA does the same, so i am guessing we should recreate the whole cluster with kOps with OIDC service discovery enabled from the start?

olemarkus commented 2 years ago

Did you rotate the entire control plane first? If not, the old KCM may be handing out tokens with the old issuer. Or there may be API servers not trusting the new one.

milan-stikic-cif commented 2 years ago

Hey, we ended up creating entirely new cluster and we don't see this problem anymore. pod-identity-webhook works as intended and service account are getting right aws permissions. To answer your question, while trying to enable IRSA on a running cluster, not even the re-creation of complete control plane helped us. We tried that by terminating master instances

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

mozz-lx commented 1 year ago

I see similar behavior in my clusters but it is happening with flannel. unfortunately creating a new k8s cluster is not an option for us.

olemarkus commented 1 year ago

As mentioned above, make sure you rotate the entire control plane and then make sure you restart all your pods (if you are running a recent k8s version).

mozz-lx commented 1 year ago

hi @olemarkus

Is it possible to add a feature to include multiple --service-account-issuer as an option for kops? that can be helpful and reduce the disruption during the adoption of IRSA.

--service-account-issuer
defines the Identifier of the service account token issuer. You can specify the --service-account-issuer argument multiple times, this can be useful to enable a non-disruptive change of the issuer. When this flag is specified multiple times, the first is used to generate tokens and all are used to determine which issuers are accepted. You must be running Kubernetes v1.22 or later to be able to specify --service-account-issuer multiple times.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#serviceaccount-token-volume-projection

olemarkus commented 1 year ago

Yeah. It wouldn't be too hard to implement, I think. Would be happy to review a PR.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/kops/issues/14201#issuecomment-1490090760): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

p3rshin commented 1 year ago

The problem we faced was that after deploying pod-identity-webhooks at the same time as OIDC changes, cilium pods won't start as pod-identity-webhooks are not yet available since masters were not rolled. You can exclude "cilium" on mutating webhook related to pod-identity-webhooks, but you still need to roll masters.

Unfortunately enabling IRSA on a live cluster is disruptive.

We have found a non-disruptive two step process below:

Enable OIDC but don't use it yet (this change would require rolling the masters).

serviceAccountIssuerDiscovery:
discoveryStore: s3://publicly-readable-store
enableAWSOIDCProvider: false

When above is completed, enable OIDC (this change won't require rolling the cluster, so after running kops update cluster --yes you will shortly see pod-identity-webhooks starting up, but it takes ~10 mins for them to become fully operational after which IRSA should be working).
```
serviceAccountIssuerDiscovery:
enableAWSOIDCProvider: true
iam:
useServiceAccountExternalPermissions: true
podIdentityWebhook:
enabled: true
```

Hope it helps someone who stumbles across the same issue.

sergey-korenets-fivestars commented 1 year ago

@p3rshin , hey Alex, thanks for sharing! What version of kops do you use? I'm getting an error:

# Found fields that are not recognized
# +     enableAWSOIDCProvider: false

Ludek2 commented 1 year ago

I managed to make the change non-disruptive by injecting the the secondary --service-account-issuer to the kube-api manifest on the master nodes. This uses the kube-api feature that allows for non-disruptive service-account-issuer change. Docs here.

The solution uses spec.hook in the control plane InstanceGroup:

kind: InstanceGroup
spec:
  hooks:
    - before:
        - kubelet.service
      manifest: |
        User=root
        Type=oneshot
        ExecStart=/bin/bash -c "until [ -f /etc/kubernetes/manifests/kube-apiserver.manifest ];do sleep 5;done;sed -i '/- --service-account-issuer=https:\/\/.*.amazonaws.com/a\ \ \ \ - --service-account-issuer=https:\/\/api.internal.[cluster-name].[domain]' /etc/kubernetes/manifests/kube-apiserver.manifest"
      name: modify-kube-api-manifest

I understand that this solution is not great, but it helped us to move forward without modifying the kops code or doing manual interventions.

kubernetes / kops

Enabling IRSA on self hosted k8s cluster with kops runs into a problem with cilium pods #14201