sergelogvinov / proxmox-cloud-controller-manager

Kubernetes cloud controller manager for Proxmox
Apache License 2.0
128 stars 16 forks source link

unable to load configmap based request(header-client-ca-file unauthorized #153

Closed pontal4 closed 3 weeks ago

pontal4 commented 1 month ago

Bug Report

Description

Hello ! I got this issue using CCM with Talos on a Proxmox cluster. I have 3 control-plane, and 5 workers.

The proxmox cluster have two nodes : pve01 -> cp-01, cp-03, workers-... pve02 -> cp-02, workers-...

It is working perfectly on cp-01, but not on cp-02 and cp-03.

Logs

Generated self-signed cert in-memory
Generated self-signed cert in-memory
Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
curl -v -XGET  -H "User-Agent: talos-cloud-controller-manager/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" -H "Accept: application/json, */*" 'https://10.0.1.102:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication'
HTTP Trace: Dial to tcp:10.0.1.102:6443 succeed
GET https://10.0.1.102:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 401 Unauthorized in 2 milliseconds
...
Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
unable to load configmap based request-header-client-ca-file: Unauthorized
"command failed" err="unable to load configmap based request-header-client-ca-file: Unauthorized"

Environment

Deployment (using Ansible) :

- name: Deploy CCM
  kubernetes.core.helm:
    name: talos-cloud-controller-manager
    namespace: kube-system
    chart_ref: oci://ghcr.io/siderolabs/charts/talos-cloud-controller-manager
    values: "{{ lookup('ansible.builtin.template', 'values.yaml.j2') | from_yaml }}"

Values :

useDaemonSet: true

image:
  repository: ghcr.io/sergelogvinov/talos-cloud-controller-manager
  tag: nodeipam

service:
  containerPort: 50258
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/scheme: "https"
    prometheus.io/port: "50258"

existingConfigSecret: proxmox-cloud-controller-manager

logVerbosityLevel: 10

enabledControllers:
  - cloud-node
  - node-ipam-controller

extraArgs:
  - --allocate-node-cidrs
  - --cidr-allocator-type=CloudAllocator
  - --node-cidr-mask-size-ipv4=24

The secret is created on the control-plane inline-manifests :

cluster:
  allowSchedulingOnControlPlanes: true
  inlineManifests:
    - name: proxmox-cloud-controller-manager
      contents: |-
        apiVersion: v1
        kind: Secret
        type: Opaque
        metadata:
          name: proxmox-cloud-controller-manager
          namespace: kube-system
        data:
          config.yaml: ${base64encode(clusters)}
machine:
  features:
    kubernetesTalosAPIAccess:
      enabled: true
      allowedRoles:
        - os:reader
      allowedKubernetesNamespaces:
        - kube-system
sergelogvinov commented 1 month ago

Hi, you mixed different configs for different CCMs

start from Talos CCM fist, since it responsible for initializing process.

TalosCCM with node-ipam-controller requires useDaemonSet CCM IPAM makes sense only with IPv6 or dual stack environments. https://github.com/siderolabs/talos-cloud-controller-manager/blob/main/docs/controllers.md#node-ipam

# Helm values
useDaemonSet: true

logVerbosityLevel: 5

enabledControllers:
  - cloud-node
  - node-ipam-controller

extraArgs:
  - --allocate-node-cidrs
  - --cidr-allocator-type=CloudAllocator
  - --node-cidr-mask-size-ipv4=24
  - --node-cidr-mask-size-ipv6=80

Talos CCM without Node IPAM

# Helm values
useDaemonSet: true

logVerbosityLevel: 5

enabledControllers:
  - cloud-node
# Talos machine config
cluster:
  allowSchedulingOnControlPlanes: true
  controllerManager:
    extraArgs:
      # Disable node IPAM controller, if you use node-ipam-controller on CCM side
      controllers: "*,tokencleaner,-node-ipam-controller"
machine:
  kubelet:
    extraArgs:
      cloud-provider: external
  features:
    kubernetesTalosAPIAccess:
      enabled: true
      allowedRoles:
        - os:reader
      allowedKubernetesNamespaces:
        - kube-system

Proxmox CCM deployment for talos - https://github.com/sergelogvinov/proxmox-cloud-controller-manager/blob/main/docs/deploy/cloud-controller-manager-talos.yml values: https://github.com/sergelogvinov/proxmox-cloud-controller-manager/blob/main/charts/proxmox-cloud-controller-manager/values.talos.yaml

Yeah, example here is not so clear https://github.com/siderolabs/talos-cloud-controller-manager/blob/main/docs/install.md

pontal4 commented 1 month ago

Hello, Thanks for your reply and your project, For context, I mixed a lot of things from a lot of documentation, thoses tools are hard to use. I would like to use Talos CCM for (i don't know), and the Proxmox CCM to tag correctly my nodes, and link correctly my PVC (i'm using proxmox-csi-driver). I check one of your repo, and you do something like this: https://github.com/sergelogvinov/terraform-talos/blob/main/proxmox/deployments/talos-ccm.yaml

I do the changes you say, but it's the same issue. Do I have to enable rotation of server certs?

sergelogvinov commented 1 month ago

Yeah, if you use kubernetes metrics better to enable certificate rotation.

If you do not use Talos CCM features, you can use only Proxmox CCM with

enabledControllers:
  - cloud-node
  - cloud-node-lifecycle

and add only this patch to talos machine config

machine:
  kubelet:
    extraArgs:
      cloud-provider: external

Unfortunately https://github.com/sergelogvinov/terraform-talos sometimes has a bug, since it is my dev/research environment

pontal4 commented 1 month ago

If I don't use the Talos CCM, my nodes are not getting labels correctly. Do you have a sample setup who works on a clustered Proxmox with multiples nodes, with CCM ? I do some tests, and sometimes, I got the error message for only one control plane, while the two others are working correctly. This feature seems hard to implement, but it's really necessary for a cluster

sergelogvinov commented 1 month ago

Can you check your setup https://github.com/sergelogvinov/proxmox-cloud-controller-manager/blob/main/docs/install.md#troubleshooting

pontal4 commented 1 month ago

I just find an "alternative", I set the talos ccm replica count to one :

replicaCount: 1

enabledControllers:
  - cloud-node
  - node-csr-approval

Pod had to restart multiple times before going "Running", but I guess it will works. Thanks you !

I also have some issues with proxmox-csi plugin "GRPC error: failed to get node worker-01: Unauthorized" So, my talos configuration is probably not right

sergelogvinov commented 1 month ago

It’s hard to tell from the logs 'GRPC error: failed to get node worker-01: Unauthorized.' But it looks like the Kubernetes service account might not have the right permissions to make the API call. Please check your deployment settings, as there might be a mistake in the configuration

Please try recreating the cluster.

pontal4 commented 3 weeks ago

Hello, After migrating to Talos 1.8.1, everything seems to works perfectly, some pods need to be restarted 2 or 3 times but it's not really a big issue now. Thanks for your time!