remche / terraform-openstack-rke2

Deploy Kubernetes on OpenStack with RKE2
Mozilla Public License 2.0
48 stars 29 forks source link

os_cinder.yaml.tpl doesn't mount correctly the cloud.conf in /etc/kubernetes in CSI container #71

Open radumalica opened 3 months ago

radumalica commented 3 months ago

I have deployed RKE2 using the example with CCM, adding my variables as stated in the docs, ssh key, floating ips and so on.

Deployed with 1 master and 3 workers. Everything gets running properly except the pods related to CSI plugin with the following error:

2024-08-12T17:51:48.889561653Z stdout F I0812 17:51:48.889331      12 openstack.go:159] InitOpenStackProvider configFiles: [/etc/kubernetes/cloud.conf]
2024-08-12T17:51:48.889599937Z stdout F E0812 17:51:48.889464      12 openstack.go:112] Failed to open OpenStack configuration file: open /etc/kubernetes/cloud.conf: no such file or directory
2024-08-12T17:51:48.889614023Z stdout F E0812 17:51:48.889535      12 openstack.go:167] GetConfigFromFiles [/etc/kubernetes/cloud.conf] failed with error: open /etc/kubernetes/cloud.conf: no such file or directory
2024-08-12T17:51:48.889919157Z stdout F W0812 17:51:48.889596      12 main.go:87] Failed to GetOpenStackProvider: open /etc/kubernetes/cloud.conf: no such file or directory

The mount exists on the container:

Mounts:

/etc/kubernetes from cloud-config (ro)

Result is:

openstack-cinder-csi-controllerplugin-747d4d7dcf-mkqj5   0/6     CrashLoopBackOff   79 (2m29s ago)   66m
openstack-cinder-csi-nodeplugin-bjfnn                    0/3     CrashLoopBackOff   43 (2m42s ago)   66m
openstack-cinder-csi-nodeplugin-jhmsw                    0/3     Completed          53 (3m53s ago)   66m
openstack-cinder-csi-nodeplugin-jj5gp                    0/3     Completed          44 (3m45s ago)   66m
openstack-cinder-csi-nodeplugin-wfknz                    0/3     CrashLoopBackOff   45 (60s ago)     66m
openstack-cinder-csi-nodeplugin-ws4ll                    0/3     CrashLoopBackOff   44 (114s ago)    66m
remche commented 3 months ago

@radumalica thanks for the report !

I would not be able to test in the next few days but I bet there were some changes in CCM helm chart that broke this example that is pretty old...

radumalica commented 3 months ago

It seems that the new chart added in values.yaml new stuff:

secret:
  enabled: false
  hostMount: true
  create: false
  filename: cloud.conf
#  name: cinder-csi-cloud-config
#  data:
#    cloud.conf: |-
#      [Global]
#      auth-url=http://openstack-control-plane
#      user-id=user-id
#      password=password
#      trust-id=trust-id
#      region=RegionOne
#      ca-file=/etc/cacert/ca-bundle.crt

In particular hostMount: true which means that cloud-config will be mounted to the host where the pod gets deployed which is highly not preferable .

The logic for the pod in the chart is as follows:

        {{- if .Values.secret.enabled }}
        - name: cloud-config
          secret:
            secretName: {{ .Values.secret.name }}
        {{- else if .Values.secret.hostMount }}
        - name: cloud-config
          hostPath:
            path: /etc/kubernetes
        {{- end }}

So if secrets are enabled (which they are) and create is also enabled it gets the secret from secret.name which is correctly deployed on kubernetes secrets once the cluster is ready. else if hostMount: true (which is by default in the helm chart) it will hostMount local host's /etc/kubernetes to the cloud-config in the pod.

I am going to try to deploy again with hostMount: false and see what i get

BTW latest chart that gets installed is 2.30 and app version is 1.30

root@maas-region:~/kubernetes-openstack/terraform-openstack-rke2/examples/cloud-controller-manager# helm search repo
NAME                                                    CHART VERSION   APP VERSION     DESCRIPTION
cloud-provider-openstack/openstack-cinder-csi           2.30.0          v1.30.0         Cinder CSI Chart for OpenStack
cloud-provider-openstack/openstack-cloud-contro...      2.30.2          v1.30.0         Openstack Cloud Controller Manager Helm Chart
cloud-provider-openstack/openstack-manila-csi           2.30.0          v1.30.0         Manila CSI Chart for OpenStack
powellchristoph commented 3 months ago

Sorry about that, there was a naming issue between two of the Openstack operators that I fixed a while ago and it looks like I didn't update it here.

radumalica commented 3 months ago

I did deploy again without hostMount: true and it doesn't work. I had to manually create /etc/kubernetes on master and all workers and create the cloud.conf file there. After that the cinder-csi pods went online and running. This is because the helm template adds args to csi plugin as --config $CLOUD_CONFIG which is further defined as an env variable with /etc/kubernetes/cloud.conf path. So even if you ask the helm template to use the k8s secret stored, the actual arguments for starting the plugin includes a path.

There is also one more thing missing which i think needs another issue, for example in my Openstack deployment with Juju and Vault, all endpoints have self signed certificate provided by Vault including Keystone auth.

In this case, there is another parameter that needs to be pushed to both CSI plugin and OCCM which is ca-file=/path/to/cr I tried to manipulate the cloud-init template to write the files directly upon VM initial configuration but somehow that didn't work .

cinder-csi-plugin from helm template:

  cinder-csi-plugin:
    Container ID:  containerd://ab38de7a07dba2e1caabd1f687ec7ef59fbe7a7299079e7aea887e3504aa091f
    Image:         registry.k8s.io/provider-os/cinder-csi-plugin:v1.30.0
    Image ID:      registry.k8s.io/provider-os/cinder-csi-plugin@sha256:5a993797393619cadbee2a320d7f647ff20887f506bef4c849b0d90e7de7c160
    Port:          9808/TCP
    Host Port:     9808/TCP
    Args:
      /bin/cinder-csi-plugin
      -v=2
      --endpoint=$(CSI_ENDPOINT)
      --cloud-config=$(CLOUD_CONFIG)
    State:          Running
      Started:      Tue, 13 Aug 2024 13:06:51 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 13 Aug 2024 13:03:10 +0000
      Finished:     Tue, 13 Aug 2024 13:06:49 +0000
    Ready:          True
    Restart Count:  10
    Liveness:       http-get http://:healthz/healthz delay=10s timeout=10s period=60s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:  unix://csi/csi.sock
      CLOUD_CONFIG:  /etc/kubernetes/cloud.conf
    Mounts:
      /csi from socket-dir (rw)
      /dev from pods-probe-dir (rw)
      /etc/cacert from cacert (ro)
      /etc/kubernetes from cloud-config (ro)
      /var/lib/kubelet from kubelet-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zdntg (ro)

same goes for oCCM:

    Args:
      /bin/openstack-cloud-controller-manager
      --v=2
      --cloud-config=$(CLOUD_CONFIG)
      --cluster-name=$(CLUSTER_NAME)
      --cloud-provider=openstack
      --use-service-account-credentials=false
      --controllers=cloud-node,cloud-node-lifecycle,route,service
      --bind-address=127.0.0.1
    State:          Running
      Started:      Tue, 13 Aug 2024 14:37:28 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 13 Aug 2024 14:34:32 +0000
      Finished:     Tue, 13 Aug 2024 14:34:38 +0000
    Ready:          True
    Restart Count:  6
    Environment:
      CLOUD_CONFIG:  /etc/config/cloud.conf
      CLUSTER_NAME:  kubernetes
    Mounts:
      /etc/config from cloud-config-volume (ro)
      /etc/kubernetes/pki from k8s-certs (ro)
      /usr/libexec/kubernetes/kubelet-plugins/volume/exec from flexvolume-dir (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w9cw6 (ro)

As you can also see here, the OCCM comes by default with cloud provider openstack, which is the in-tree setup for OCCM. It should be configured with cloud-provider=external.

There is a migration document on how to do that with kubeadm deployed setups, but that process is not applicable to RKE2 setup.

The above paths mounted to /etc/kubernetes as RO for CSI plugin and /etc/config as RO for oCCM should come from cloud-config secret. The one for oCCM works, the other for CSI doesn't.

powellchristoph commented 3 months ago

Here are the Helm charts, versions and values that I use for my deployments of the openstack-cinder-csi and the openstack-cloud-controller-manager. They both share the cloud-config secret.

I'm using the Helm charts straight off the shelf without any crazy configurations.

repoURL: 'https://kubernetes.github.io/cloud-provider-openstack'
targetRevision: 2.30.0
helm:
releaseName: cinder-csi-plugin
values: >
  clusterID: 'dev'

  csi:
    plugin:
      nodePlugin:
        tolerations:
          - key: CriticalAddonsOnly
            operator: Exists

  secret:
    enabled: true
    name: cloud-config
chart: openstack-cinder-csi

repoURL: 'https://kubernetes.github.io/cloud-provider-openstack'
targetRevision: 2.30.2
helm:
releaseName: openstack-cloud-controller-manager
values: |
  cluster:
    name: dev
  nodeSelector:
    node-role.kubernetes.io/control-plane: "true"
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  secret:
    create: true
    name: cloud-config
  cloudConfig:
    global:
      auth-url: https://auth.example.com:5000
      region: use1
      tenant-name: dev
      application-credential-id: akeyidgoeshere
      application-credential-secret: asecretgoeshere
    loadBalancer:
      lb-version: v2
      use-octavia: true
      subnet-id: subnet-id-goes-here
      floating-network-id: floating-network-id-goes-here
      lb-provider: octavia
chart: openstack-cloud-controller-manager
radumalica commented 3 months ago

@powellchristoph maybe you can post a little howto for deploying the helm charts customized in the context of this repo, I am an experienced Cloud Infra Eng and devops guy, but I think some instructions will help other people which are not so deep in troubleshooting. Thank you