kubermatic / kubeone

Kubermatic KubeOne automate cluster operations on all your cloud, on-prem, edge, and IoT environments.
https://kubeone.io
Apache License 2.0
1.38k stars 234 forks source link

In-Tree CCM does not work on GCP #2116

Closed stroebitzer closed 2 years ago

stroebitzer commented 2 years ago

What happened?

On setting up a KubeOne cluster with

no PVs will get created. The PVCs are in pending state forever

Expected behavior

The PVs get created

How to reproduce the issue?

kubeone.yaml

apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
versions:
  kubernetes: 1.23.7
cloudProvider:
  gce: {}
  cloudConfig: |
    [global]
    regional = true

storageclass.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: kubermatic-fast
provisioner: kubernetes.io/gce-pd
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  type: pd-ssd

pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test
spec:
  storageClassName: kubermatic-fast
  accessModes:
   - ReadWriteOnce
  resources:
   requests:
    storage: 1Gi

This is true for Kubernetes version 1.23.7

This is not true for Kubernetes versions 1.22.2 and 1.22.11

The kube-controller-manager pod says the following:

I0623 14:22:43.599862       1 event.go:294] "Event occurred" object="default/test" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="ExternalProvisioning" message="waiting for a volume to be created, either by external provisioner \"pd.csi.storage.gke.io\" or manually created by system administrator"

The kube-controller-manager pod looks like this:

Name:                 kube-controller-manager-master-control-plane-3
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 master-control-plane-3/10.240.0.3
Start Time:           Thu, 23 Jun 2022 11:10:28 +0000
Labels:               component=kube-controller-manager
                      tier=control-plane
Annotations:          kubernetes.io/config.hash: d01804437394584abe1113c105788a97
                      kubernetes.io/config.mirror: d01804437394584abe1113c105788a97
                      kubernetes.io/config.seen: 2022-06-23T11:10:59.236464719Z
                      kubernetes.io/config.source: file
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running
IP:                   10.240.0.3
IPs:
  IP:           10.240.0.3
Controlled By:  Node/master-control-plane-3
Containers:
  kube-controller-manager:
    Container ID:  containerd://aad13c8277bfdca9c9eb3c6c1a5d5a264c5bfe0eaa63fe4cce960acf94a6451f
    Image:         k8s.gcr.io/kube-controller-manager:v1.23.7
    Image ID:      k8s.gcr.io/kube-controller-manager@sha256:db4970df9c7a657d31299a6bc96b86d9d36c1d91e914b3428a43392fa4299068
    Port:          <none>
    Host Port:     <none>
    Command:
      kube-controller-manager
      --allocate-node-cidrs=true
      --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
      --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
      --bind-address=127.0.0.1
      --client-ca-file=/etc/kubernetes/pki/ca.crt
      --cloud-config=/etc/kubernetes/cloud-config
      --cloud-provider=gce
      --cluster-cidr=10.244.0.0/16
      --cluster-name=master
      --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
      --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
      --controllers=*,bootstrapsigner,tokencleaner
      --flex-volume-plugin-dir=/var/lib/kubelet/volumeplugins
      --kubeconfig=/etc/kubernetes/controller-manager.conf
      --leader-elect=true
      --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
      --root-ca-file=/etc/kubernetes/pki/ca.crt
      --service-account-private-key-file=/etc/kubernetes/pki/sa.key
      --service-cluster-ip-range=10.96.0.0/12
      --use-service-account-credentials=true
    State:          Running
      Started:      Thu, 23 Jun 2022 11:11:12 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     200m
    Liveness:  http-get https://127.0.0.1:10257/healthz delay=10s timeout=15s period=10s #success=1 #failure=8
    Startup:   http-get https://127.0.0.1:10257/healthz delay=10s timeout=15s period=10s #success=1 #failure=24
    Environment:
      SSL_CERT_FILE:  /etc/ssl/certs/ca-certificates.crt
    Mounts:
      /etc/ca-certificates from etc-ca-certificates (ro)
      /etc/kubernetes/cloud-config from cloud-config (ro)
      /etc/kubernetes/controller-manager.conf from kubeconfig (ro)
      /etc/kubernetes/pki from k8s-certs (ro)
      /etc/ssl/certs from ca-certs (ro)
      /usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
      /usr/share/ca-certificates from usr-share-ca-certificates (ro)
      /var/lib/kubelet/volumeplugins from flexvolume-dir (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  ca-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  DirectoryOrCreate
  cloud-config:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/cloud-config
    HostPathType:  File
  etc-ca-certificates:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ca-certificates
    HostPathType:  DirectoryOrCreate
  flexvolume-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/volumeplugins
    HostPathType:  DirectoryOrCreate
  k8s-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki
    HostPathType:  DirectoryOrCreate
  kubeconfig:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/controller-manager.conf
    HostPathType:  FileOrCreate
  usr-local-share-ca-certificates:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/local/share/ca-certificates
    HostPathType:  DirectoryOrCreate
  usr-share-ca-certificates:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/share/ca-certificates
    HostPathType:  DirectoryOrCreate
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute op=Exists
Events:            <none>

The Kubernetes Changelog says the following:

If CSI Migration is working properly, Kubernetes end users shouldn’t notice a difference. After migration, Kubernetes users may continue to rely on all the functionality of in-tree storage plugins using the existing interface.

What KubeOne version are you using?

{
  "kubeone": {
    "major": "1",
    "minor": "4",
    "gitVersion": "1.4.4",
    "gitCommit": "3d62a6ff07d0f3eacf9c9900acf8ccb71333466f",
    "gitTreeState": "",
    "buildDate": "2022-06-02T13:24:15Z",
    "goVersion": "go1.18.1",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "43",
    "gitVersion": "v1.43.3",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

Provide your KubeOneCluster manifest here (if applicable)

please see above

What cloud provider are you running on?

GCP

What operating system are you running in your cluster?

Ubuntu

Additional information

embik commented 2 years ago

Looking at the event that was emitted, I suspect this is happening because Kubernetes 1.23 set the CSIMigrationGCE feature gate to true by default (see https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features). That wasn't the case for 1.21 and 1.22, so internally Kubernetes is probably trying to do the CSI migration.

kron4eg commented 2 years ago

Probably we'd need to start shipping CSI driver (in absence of CCM) for GCE.

kron4eg commented 2 years ago

Looks like they do have stable release https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver

xmudrii commented 2 years ago

Yes, this is correct -- we need to start applying CSI driver for GCE. We already do that for all other providers, but GCE was forgotten for some reason: https://github.com/kubermatic/kubeone/blob/f7070ab8ecd7b77b286c8b34d25b364a8d3804c3/pkg/addons/ensure.go#L320-L399

Originally, it was planned that CSIMigration* feature gates support fallback to the in-tree provider, but majority providers didn't implement the fallback properly. AFAIK, the fallback to the in-tree provider only works for OpenStack and maybe vSphere.

We also have #1710 to track this.