kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.
http://cluster-api-aws.sigs.k8s.io/
Apache License 2.0
635 stars 560 forks source link

Controller manager seems to be not working properly due to cachcing errors #5103

Open changhyuni opened 3 weeks ago

changhyuni commented 3 weeks ago

/kind bug

What steps did you take and what happened:

I0823 03:05:37.226125       1 request.go:1212] Response Body: {"kind":"APIResourceList","apiVersion":"v1","groupVersion":"controlplane.cluster.x-k8s.io/v1beta2","resources":[{"name":"awsmanagedcontrolplanes","singularName":"awsmanagedcontrolplane","namespaced":true,"kind":"AWSManagedControlPlane","verbs":["delete","deletecollection","get","list","patch","create","update","watch"],"shortNames":["awsmcp"],"categories":["cluster-api"],"storageVersionHash":"WnEFh7oqH48="},{"name":"awsmanagedcontrolplanes/status","singularName":"","namespaced":true,"kind":"AWSManagedControlPlane","verbs":["get","patch","update"]},{"name":"rosacontrolplanes","singularName":"rosacontrolplane","namespaced":true,"kind":"ROSAControlPlane","verbs":["delete","deletecollection","get","list","patch","create","update","watch"],"shortNames":["rosacp"],"categories":["cluster-api"],"storageVersionHash":"qdhYg8dFBqo="},{"name":"rosacontrolplanes/status","singularName":"","namespaced":true,"kind":"ROSAControlPlane","verbs":["get","patch","update"]}]}
I0823 03:05:37.226379       1 shared_informer.go:337] stop requested
E0823 03:05:37.226393       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedControlPlane Informer to sync"
I0823 03:05:37.226424       1 reflector.go:289] Starting reflector *v1beta2.AWSManagedControlPlane (9m13.30993253s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
I0823 03:05:37.226445       1 shared_informer.go:337] stop requested
I0823 03:05:37.226459       1 shared_informer.go:337] stop requested
E0823 03:05:37.226464       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedControlPlane Informer to sync"
I0823 03:05:37.226448       1 reflector.go:289] Starting reflector *v1beta2.AWSManagedCluster (10m42.211609855s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
E0823 03:05:37.226477       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSCluster Informer to sync"
I0823 03:05:37.226481       1 reflector.go:295] Stopping reflector *v1beta2.AWSManagedCluster (10m42.211609855s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
I0823 03:05:37.226486       1 shared_informer.go:337] stop requested
I0823 03:05:37.226433       1 shared_informer.go:337] stop requested
E0823 03:05:37.226497       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSCluster Informer to sync"
I0823 03:05:37.226437       1 shared_informer.go:337] stop requested
E0823 03:05:37.226511       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta1.Machine Informer to sync"
I0823 03:05:37.226449       1 reflector.go:295] Stopping reflector *v1beta2.AWSManagedControlPlane (9m13.30993253s) from pkg/mod/k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229
I0823 03:05:37.226538       1 internal.go:530] "Stopping and waiting for webhooks"
E0823 03:05:37.226499       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedCluster Informer to sync"
I0823 03:05:37.226429       1 shared_informer.go:337] stop requested
E0823 03:05:37.226562       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedCluster Informer to sync"
I0823 03:05:37.226441       1 shared_informer.go:337] stop requested
E0823 03:05:37.226576       1 kind.go:68] "controller-runtime/source/EventHandler: failed to get informer from cache" err="Timeout: failed waiting for *v1beta2.AWSManagedControlPlane Informer to sync"
I0823 03:05:37.226596       1 server.go:249] "controller-runtime/webhook: Shutting down webhook server with timeout of 1 minute"
I0823 03:05:37.226660       1 internal.go:533] "Stopping and waiting for HTTP servers"
I0823 03:05:37.226688       1 server.go:43] "shutting down server" kind="health probe" addr="[::]:9440"
I0823 03:05:37.226717       1 internal.go:537] "Wait completed, proceeding to shutdown the manager"
E0823 03:05:37.226747       1 logger.go:99] "setup: problem running manager" err="failed to start metrics server: failed to create listener: listen tcp: address 8443: missing port in address"

Environment: Here my manifest (controller manager)

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    argocd.argoproj.io/instance: cluster-api
    cluster.x-k8s.io/provider: infrastructure-aws
    control-plane: capa-controller-manager
  name: capa-controller-manager
  namespace: capa-system
spec:
  replicas: 1
  selector:
    matchLabels:
      cluster.x-k8s.io/provider: infrastructure-aws
      control-plane: capa-controller-manager
  template:
    metadata:
      labels:
        cluster.x-k8s.io/provider: infrastructure-aws
        control-plane: capa-controller-manager
    spec:
      containers:
        - args:
            - '--leader-elect'
            - '--feature-gates=EKS=true'
            - '--v=10'
            - '--diagnostics-address=8443'
            - '--insecure-diagnostics=false'
          env:
            - name: AWS_SHARED_CREDENTIALS_FILE
              value: /home/.aws/credentials
          image: >-
            kcr.dev.kabang.cloud/container-registry/external/cluster-api/cluster-api-aws-controller:v2.4.1
          imagePullPolicy: IfNotPresent
          imagePullSecrets: kcr-token
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: healthz
            periodSeconds: 10
          name: manager
          ports:
            - containerPort: 9443
              name: webhook-server
              protocol: TCP
            - containerPort: 9440
              name: healthz
              protocol: TCP
            - containerPort: 8443
              name: metrics
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /readyz
              port: healthz
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            runAsGroup: 65532
            runAsUser: 65532
          volumeMounts:
            - mountPath: /tmp/k8s-webhook-server/serving-certs
              name: cert
              readOnly: true
      securityContext:
        fsGroup: 1000
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: capa-controller-manager
      terminationGracePeriodSeconds: 10
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
      volumes:
        - name: cert
          secret:
            defaultMode: 420
            secretName: capa-webhook-service-cert
k8s-ci-robot commented 3 weeks ago

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.