kubermatic / kubeone

Kubermatic KubeOne automate cluster operations on all your cloud, on-prem, edge, and IoT environments.
https://kubeone.io
Apache License 2.0
1.35k stars 231 forks source link

kubeone apply never succeeds with dynamicWorkers and custom helm releases #3252

Closed P4sca1 closed 1 month ago

P4sca1 commented 2 months ago

What happened?

The kubeone deployment gets stuck, as it is waiting for jobs to finish. The jobs can't finish, because no worker nodes exist. I am using dynamicWorkers only.

time="17:32:13 CEST" level=debug msg="hubble-generate-certs: Jobs active: 0, jobs failed: 0, jobs succeeded: 0"

Expected behavior

I would expect kubeone to create the MachineDeployment right after the machine-controller is deployed, so that worker nodes start to spawn and helm deployments can successfully finish.

How to reproduce the issue?

The issue occurs during initial deployment and can be fixed by manually creating the MachineDeployment.

What KubeOne version are you using?

```console $ kubeone version { "kubeone": { "major": "1", "minor": "8", "gitVersion": "1.8.0", "gitCommit": "c280d14d95ac92a27576851cc058fc84562fcc55", "gitTreeState": "", "buildDate": "2024-05-14T15:41:44Z", "goVersion": "go1.22.3", "compiler": "gc", "platform": "darwin/amd64" }, "machine_controller": { "major": "1", "minor": "59", "gitVersion": "v1.59.1", "gitCommit": "", "gitTreeState": "", "buildDate": "", "goVersion": "", "compiler": "", "platform": "linux/amd64" } } ```

Provide your KubeOneCluster manifest here (if applicable)

```yaml apiVersion: kubeone.k8c.io/v1beta2 kind: KubeOneCluster name: "ips-k8s" versions: kubernetes: "1.29.6" cloudProvider: hetzner: networkID: "kubernetes" external: true # Install Hetzner CSI manually for more control disableBundledCSIDrivers: true controlPlane: hosts: # REDACTED (3 nodes) clusterNetwork: podSubnet: "10.0.64.0/18" nodeCIDRMaskSizeIPv4: 24 serviceSubnet: "10.0.128.0/18" ipFamily: IPv4 kubeProxy: skipInstallation: true cni: # Install cilium using helm chart manually for more control external: {} apiEndpoint: host: "REDACTED" port: 6443 machineController: deploy: true dynamicWorkers: - name: "nbg1-cx22" replicas: 3 providerSpec: cloudProviderSpec: serverType: "cx22" image: ubuntu-22.04 location: "nbg1" placementGroupPrefix: kube-node networks: - "kubernetes" labels: type: node assignPublicIPv4: true assignPublicIPv6: true sshPublicKeys: - ssh-ed25519 REDACTED operatingSystem: ubuntu operatingSystemSpec: distUpgradeOnBoot: true helmReleases: - chart: cilium repoURL: https://helm.cilium.io/ version: 1.15.6 namespace: kube-system releaseName: cilium values: - valuesFile: ./helm-values/cilium.yaml - inline: k8sServiceHost: "REDACTED" k8sServicePort: 6443 ```

What cloud provider are you running on?

Hetzner Cloud

What operating system are you running in your cluster?

Flatcar Linux Beta for Control Plane Ubuntu 22 for Worker Nodes

Additional information

Cilium helm values:

cni:
  exclusive: true

hubble:
  enabled: true
  relay:
    enabled: true
  tls:
    auto:
      enabled: true
      method: cronJob
  ui:
    enabled: true

ipam:
  mode: kubernetes

kubeProxyReplacement: true
# Defined inline
# k8sServiceHost: 
# k8sServicePort:

l2announcements:
  enabled: false
externalIPs:
  enabled: false

ingressController:
  enabled: true
  loadbalancerMode: shared
  service:
    type: LoadBalancer
    annotations:
      load-balancer.hetzner.cloud/location: nbg1
      load-balancer.hetzner.cloud/use-private-ip: "true"
      load-balancer.hetzner.cloud/name: cilium-ingress
      load-balancer.hetzner.cloud/type: lb11
      load-balancer.hetzner.cloud/protocol: tcp
kron4eg commented 1 month ago

Well. This is not possible.

We run helm operations even before the machine-controller, to allow deploying of CNI (like you do), to have some Ready nodes.

In your case I recommend you to set tolerations (see https://github.com/cilium/cilium/blob/v1.15.6/install/kubernetes/cilium/values.yaml#L994) to open control-plane nodes for certgen Job.