CAPG: Upstream CCM manifest doesn't work

jayesh-srivastava commented 4 months ago

Tried deploying CCM in a CAPG cluster and used the provided CCM manifest from (https://github.com/kubernetes/cloud-provider-gcp/blob/master/deploy/packages/default/manifest.yaml). The CCM pod is stuck in CrashLoopBack with this error:

unable to load configmap based request-header-client-ca-file: Get "https://127.0.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 127.0.0.1:443: connect: connection refused

k8s-ci-robot commented 4 months ago

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

mcbenjemaa commented 4 months ago

Please use:

  command: ['/usr/local/bin/cloud-controller-manager']
  args:
  - --cloud-provider=gce
  - --leader-elect=true
  - --use-service-account-credentials

and remove the env.

mcbenjemaa commented 4 months ago

/kind support

jayesh-srivastava commented 4 months ago

Hi @mcbenjemaa , Thanks for the help. CCM pod is up now with these args

  - args:
    - --cloud-provider=gce
    - --leader-elect=true
    - --use-service-account-credentials
    - --allocate-node-cidrs=true
    - --cluster-cidr=192.168.0.0/16
    - --configure-cloud-routes=false

One more doubt, I see the cloud-controller-manager image being used is k8scloudprovidergcp/cloud-controller-manager:latest . How can I use k8s version specific images for ccm?

BenTheElder commented 3 months ago

You may have to build the image while the release process is being revampled, there are instructions in the README.

The :latest tag is aimed at CI / testing of the project itself I think.

/retitle CAPG: Upstream CCM manifest doesn't work

I don't think the manifest is necessarily meant to work with CAPG, I would expect CAPG to handle deploying everything?

Otherwise this may be in scope for #686

mcbenjemaa commented 3 months ago

Self deployed CCM: i got this error:

message="Error syncing load balancer: failed to ensure load balancer: instance not found"

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

esierra-stratio commented 1 week ago

Something similar here, i'm trying to deploy the Cloud Controller Manager (CCM) and I'm encountering the following error:

I0823 08:10:42.838284       1 node_controller.go:391] Initializing node minplus0-md-2-vbvmr-856l7 with cloud provider
I0823 08:10:42.920926       1 gen.go:15649] GCEInstances.Get(context.Background.WithDeadline(2024-08-23 09:10:42.83965981 +0000 UTC m=+3629.567729051 [59m59.918720336s]), Key{"minplus0-md-2-vbvmr-856l7", zone: "europe-west4-b"}) = <nil>, googleapi: Error 404: The resource 'projects/clusterapi-369611/zones/europe-west4-b/instances/minplus0-md-2-vbvmr-856l7' was not found, notFound
E0823 08:10:42.921062       1 node_controller.go:213] error syncing 'minplus0-md-2-vbvmr-856l7': failed to get instance metadata for node minplus0-md-2-vbvmr-856l7: failed to get instance ID from cloud provider: instance not found, requeuing

I don't understand why CCM is adding the label zone as:

I0823 08:10:41.974944       1 node_controller.go:493] Adding node label from cloud provider: beta.kubernetes.io/instance-type=n2-standard-2
I0823 08:10:41.974950       1 node_controller.go:494] Adding node label from cloud provider: node.kubernetes.io/instance-type=n2-standard-2
I0823 08:10:41.974954       1 node_controller.go:505] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=europe-west4-b
I0823 08:10:41.974958       1 node_controller.go:506] Adding node label from cloud provider: topology.kubernetes.io/zone=europe-west4-b
I0823 08:10:41.974963       1 node_controller.go:516] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=europe-west4
I0823 08:10:41.974968       1 node_controller.go:517] Adding node label from cloud provider: topology.kubernetes.io/region=europe-west4

The correct zone should be gce://clusterapi-369611/europe-west4-c/minplus0-md-2-vbvmr-856l7. This is how I'm deploying CCM:

        - name: cloud-controller-manager
          image: k8scloudprovidergcp/cloud-controller-manager:latest
          imagePullPolicy: IfNotPresent
          # ko puts it somewhere else... command: ['/usr/local/bin/cloud-controller-manager']
          command: ['/usr/local/bin/cloud-controller-manager']
          args:
            - --cloud-provider=gce  # Add your own cloud provider here!
            - --leader-elect=true
            - --use-service-account-credentials
            # these flags will vary for every cloud provider
            - --allocate-node-cidrs=true
            - --configure-cloud-routes=true
            - --cluster-cidr=192.168.0.0/16
            - --v=4
          livenessProbe:
            failureThreshold: 3
            httpGet:
              host: 127.0.0.1
              path: /healthz
              port: 10258
              scheme: HTTPS
            initialDelaySeconds: 15
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 15
          resources:
            requests:
              cpu: "200m"
          volumeMounts:
            - mountPath: /etc/kubernetes/cloud.config
              name: cloudconfig
              readOnly: true
      hostNetwork: true
      priorityClassName: system-cluster-critical
      volumes:
        - hostPath:
            path: /etc/kubernetes/cloud.config
            type: ""
          name: cloudconfig

aojea commented 1 week ago

The correct zone should be gce://clusterapi-369611/europe-west4-c/minplus0-md-2-vbvmr-856l7.

what do you mean by correct zone there?

the instance url is https://www.googleapis.com/compute/v1/projects/{PROJECT}/zones/{ZONE}/instances/{VM_INSTANCE}

that is the providerId, isn't it?

esierra-stratio commented 1 week ago

The issue is that the GCEInstances.Get function constructs the provider ID with the wrong zone. It assumes the zone must match where the master CCM is deployed (in this case, europe-west4-b), instead of the correct one, which is europe-west4-c. That's why the CCM couldn't find the instance.

Is there any way to make the CCM check every single zone? Maybe a multizone option or something similar?

esierra-stratio commented 1 week ago

Solved!

          args:
            - --cloud-provider=gce  # Add your own cloud provider here!
            - --leader-elect=true
            - --use-service-account-credentials
            # these flags will vary for every cloud provider
            - --allocate-node-cidrs=true
            - --cluster-cidr=192.168.0.0/16
            - --v=4
            - --cloud-config=/etc/kubernetes/gce.conf
          livenessProbe:
            failureThreshold: 3
            httpGet:
              host: 127.0.0.1
              path: /healthz
              port: 10258
              scheme: HTTPS
            initialDelaySeconds: 15
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 15
          resources:
            requests:
              cpu: "200m"
          volumeMounts:
            - mountPath: /etc/kubernetes/gce.conf
              name: cloudconfig
              readOnly: true
      hostNetwork: true
      priorityClassName: system-cluster-critical
      volumes:
        - hostPath:
            path: /etc/kubernetes/gce.conf
            type: FileOrCreate
          name: cloudconfig

where gce.conf:

[Global]
multizone=true

kubernetes / cloud-provider-gcp

CAPG: Upstream CCM manifest doesn't work #666