kubermatic / operating-system-manager

Operating System Manager is responsible for creating and managing the configuration that are needed to configure worker nodes
Apache License 2.0
36 stars 31 forks source link

kubectl logs and exec fails for worker nodes #416

Closed toschneck closed 1 month ago

toschneck commented 1 month ago

What happened?

After Setup of kubeone cluster at AWS any kubectl logs or kubectl exec command fails if the target is a worker machine from the machinecontroller.

kubectl logs -n test nginx-7854ff8877-tsk8n 
error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log nginx-7854ff8877-tsk8n))
kubectl exec -n test nginx-7854ff8877-tsk8n -- sh
error: unable to upgrade connection: Unauthorized

The workload of the cluster itself seems not to be effected, but administration is hard restricted so this needs to get fixed soon.

Expected behavior

Command kubectl logs kubectl exec does work for all nodes.

How to reproduce the issue?

Create a fresh kubeone 1.8.2 cluster

What KubeOne version are you using?

```console { "kubeone": { "major": "1", "minor": "8", "gitVersion": "v1.8.2", "gitCommit": "667a5da2bc1eab61b3183cdfcfc192deef49f985", "gitTreeState": "", "buildDate": "2024-08-13T21:16:28+02:00", "goVersion": "go1.22.6", "compiler": "gc", "platform": "darwin/arm64" }, "machine_controller": { "major": "1", "minor": "59", "gitVersion": "v1.59.3", "gitCommit": "", "gitTreeState": "", "buildDate": "", "goVersion": "", "compiler": "", "platform": "linux/amd64" } } ```

Provide your KubeOneCluster manifest here (if applicable)

```yaml apiVersion: kubeone.k8c.io/v1beta2 kind: KubeOneCluster versions: kubernetes: '1.29.8' cloudProvider: aws: {} external: true clusterNetwork: cni: cilium: kubeProxyReplacement: "strict" enableHubble: true kubeProxy: skipInstallation: true addons: enable: true addons: - name: cluster-autoscaler - name: default-storage-class caBundle: |- ## CUSTOM CA BUNDLE ``` MachineDeployment: ```yaml apiVersion: cluster.k8s.io/v1alpha1 kind: MachineDeployment metadata: annotations: cluster.k8s.io/cluster-api-autoscaler-node-group-max-size: "3" cluster.k8s.io/cluster-api-autoscaler-node-group-min-size: "1" k8c.io/operating-system-profile: osp-ubuntu machinedeployment.clusters.k8s.io/revision: "1" name: seed-aws-eu-central-1a namespace: kube-system spec: minReadySeconds: 0 progressDeadlineSeconds: 600 # replicas: 1 revisionHistoryLimit: 1 selector: matchLabels: workerset: seed-aws-eu-central-1a strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0 type: RollingUpdate template: metadata: creationTimestamp: null labels: isSpotInstance: "false" workerset: seed-aws-eu-central-1a namespace: kube-system spec: metadata: creationTimestamp: null labels: isSpotInstance: "false" workerset: seed-aws-eu-central-1a providerSpec: value: caPublicKey: "" cloudProvider: aws cloudProviderSpec: ami: ami-0a43b9fc420cabb27 assignPublicIP: false availabilityZone: eu-central-1a diskIops: 3000 diskSize: 50 diskType: gp3 ebsVolumeEncrypted: false instanceProfile: example-int-debug-seed-host instanceType: t3.xlarge isSpotInstance: false region: eu-central-1 securityGroupIDs: - sg-02fce40ae7fe3dcf0 spotInstanceConfig: maxPrice: "0.100000" subnetId: subnet-02392d82b371af831 tags: example-int-debug-seed-workers: "" kubernetes.io/cluster/example-int-debug-seed: shared vpcId: vpc-075d4fd1298b16f95 operatingSystem: ubuntu operatingSystemSpec: distUpgradeOnBoot: false provisioningUtility: "" sshPublicKeys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC4kRcfjM/3s7rZlKMpvs89UfD745S+LEU7JbQnd4qW9Vqb8k312w8uVyqakC0dYUx/PLO6qRrqvTfPdlc4cri7tEAobWZl3kACBkUQfI0toFEqKotFPuS0RW/ZiG8PKbiYEKamXhYgKR9uRCxGz6EFvOrrfOXf7vsA8lKXf7N0KUcaPPKqlgrxQG2raeMLF0IuIeuF0tT6j6uy2hzXnnYadLyqCRkRQ38Wk8oHNlX4M5fUWXeq7UF8uwueJUsKjiG3T6xsD985A8gD+yz/g36y1aQQEnyGPmTjP3uHcnys+ixCQumQi4MLpkHuKLF7BBwMJCVvGH0dPq81v0XALsTFB94Qllw0MGpHAwd77f7w+zb/AF0l2tpXY41ZWNuQrzPXerRhUEwxRiC5XpAbvLXrWxotFU/bApukmXUvIJ6Dwv6BgDzJc1REsLJ8Xtxu5oaL0bF/Ma2FYAX6pYWCG8Zt0hoKkcB29BWijRzAtmR0/Sa9ps+slLzKlBtkB7WlgjE= kubermatic@docker-desktop versions: kubelet: 1.29.6 ```

What cloud provider are you running on?

AWS

What operating system are you running in your cluster?

Ubuntu 22.04

Additional information

It seams somehow that the kubelet config of the nodes are not correct or the communication from kubelet to control plane is broken. Relevant stack overflow: https://stackoverflow.com/a/50020792

kubectl with the admin/super-admin kubeconfig also works not from the control plane machines:

root@ip-172-32-162-21:~# kubectl --kubeconfig /etc/kubernetes/super-admin.conf -n test logs nginx-7854ff8877-l2jtz
error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log nginx-7854ff8877-l2jtz))
root@ip-172-32-162-21:~# kubectl --kubeconfig /etc/kubernetes/admin.conf -n test logs nginx-7854ff8877-l2jtz
error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log nginx-7854ff8877-l2jtz))
toschneck commented 1 month ago

/label customer-request

kron4eg commented 1 month ago

This seems to be an issue with certificates of kubelet.

xmudrii commented 1 month ago

To add on what @kron4eg said, please check if CSRs are getting properly approved, e.g. kubectl get csr and then approve all pending CSRs, then try again after a few minutes. If it doesn't work after approving all CSRs, try to restart kubelet.

toschneck commented 1 month ago

After some debugging the Problem seams to be the custom CA Bundle. The caBundle: option seams to override or miss render the k8s CA.

Workaround:

1) extract from the kubernetes kubeconfig of the cluster certificate-authority-data: 2) decode via base64 -d 3) add the kubernetes cluster ca to the caBundle field:

apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
versions:
  kubernetes: '1.29.8'
cloudProvider:
  aws: {}
  external: true
clusterNetwork:
  cni:
    cilium:
      kubeProxyReplacement: "strict"
      enableHubble: true
  kubeProxy:
    skipInstallation: true
addons:
  enable: true
  addons:
    - name: cluster-autoscaler
    - name: default-storage-class
caBundle: |-
  ## CUSTOM CA BUNDLE 
  ## FIXME: Temporary Workaround adding CA of Kubernetes Cluster itself -> extracted from kubeconfig

  # CA Kubeconfig of cluster
  -----BEGIN CERTIFICATE-----
  xxxx
  -----END CERTIFICATE-----

4) Delete the machinedeployment, so that the OSC get recreated

kubectl delete md -n kube-system --all

5) re-apply the machine deployment

kubectl apply -f machinedeployment.yaml
toschneck commented 1 month ago

The Problems seam the OSC what doesn't get the k8s ca at this point and instead the caBundle:

    - content:
        inline:
          data: "## CUSTOM CA BUNDLE \n## FIXME: Temporary Workaround adding CA of
            Kubernetes Cluster itself -> extracted from kubeconfig\n\n# CA Kubeconfig
            of AWS Seed cluster\n-----BEGIN CERTIFICATE-----xxxxxxxxxxxxxx\n-----END
            CERTIFICATE-----\n"
          encoding: b64
      path: /etc/kubernetes/pki/ca.crt
      permissions: 644

It seam the OSP var is is not renderd correctly:

      - path: /etc/kubernetes/pki/ca.crt
        content:
          inline:
            encoding: b64
            data: |
              {{ .KubernetesCACert }}

https://github.com/kubermatic/operating-system-manager/blob/bc2dc58ac76271e98eef1b7905106ac02086b601/deploy/osps/default/osp-ubuntu.yaml#L715

embik commented 1 month ago

I think this is related to #399 (being a fix for it) maybe?

kron4eg commented 1 month ago

Indeed, and the fix has not been released (and included in K1) yet.

kron4eg commented 1 month ago

So we just need a new release!

toschneck commented 1 month ago

Awesome so it's already cherry picked for next kubeone 1.8 release?

kron4eg commented 1 month ago

@toschneck and KKP too of course.

kron4eg commented 1 month ago

OK, we have new v1.5.3 release