hetznercloud / csi-driver

Kubernetes Container Storage Interface driver for Hetzner Cloud Volumes
MIT License
617 stars 102 forks source link

PVC Fail "existing disk format of " #581

Open mattiashem opened 5 months ago

mattiashem commented 5 months ago

TL;DR

Warning FailedMount 38s (x3536 over 4d23h) kubelet MountVolume.SetUp failed for volume "pvc-a1e2b216-bd1f-4e3f-b54f-ebc8ce7760bd" : rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100455302: disk /dev/disk/by-id/scsi-0HC_Volume_100455302 propably contains partitions

Expected behavior

Have 3 cluster and it workes fine and mount the files in the other clusters

Observed behavior

Warning FailedMount 38s (x3536 over 4d23h) kubelet MountVolume.SetUp failed for volume "pvc-a1e2b216-bd1f-4e3f-b54f-ebc8ce7760bd" : rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100455302: disk /dev/disk/by-id/scsi-0HC_Volume_100455302 propably contains partitions

Minimal working example

Used the install instructions from the guide. applied the PVC

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: prometheus-server1
  namespace: metrics
spec:
  storageClassName: hcloud-volumes
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50G

Log output

[core@bastion storage]$ kubectl logs -f hcloud-csi-controller-597f65fc8f-9mcbz -n kube-system -c csi-attacher
I0309 18:42:01.900716       1 main.go:94] Version: v4.1.0
I0309 18:42:06.245083       1 common.go:111] Probing CSI driver for readiness
I0309 18:42:06.250370       1 controller.go:130] Starting CSI attacher

[core@bastion storage]$ kubectl logs -f hcloud-csi-controller-597f65fc8f-9mcbz -n kube-system -c csi-attacher
I0309 18:42:01.900716       1 main.go:94] Version: v4.1.0
I0309 18:42:06.245083       1 common.go:111] Probing CSI driver for readiness
I0309 18:42:06.250370       1 controller.go:130] Starting CSI attacher
^C
[core@bastion storage]$ kubectl logs -f hcloud-csi-controller-597f65fc8f-9mcbz -n kube-system -c csi-provisioner
W0309 18:42:02.709835       1 feature_gate.go:241] Setting GA feature gate Topology=true. It will be removed in a future release.
I0309 18:42:02.709895       1 csi-provisioner.go:154] Version: v3.4.0
I0309 18:42:02.709902       1 csi-provisioner.go:177] Building kube configs for running in cluster...
I0309 18:42:05.661408       1 common.go:111] Probing CSI driver for readiness
I0309 18:42:05.666448       1 csi-provisioner.go:299] CSI driver supports PUBLISH_UNPUBLISH_VOLUME, watching VolumeAttachments
I0309 18:42:05.768951       1 controller.go:811] Starting provisioner controller csi.hetzner.cloud_hcloud-csi-controller-597f65fc8f-9mcbz_2534d57e-be30-4ed7-a386-6614373244c0!
I0309 18:42:05.769030       1 volume_store.go:97] Starting save volume queue
I0309 18:42:05.870436       1 controller.go:860] Started provisioner controller csi.hetzner.cloud_hcloud-csi-controller-597f65fc8f-9mcbz_2534d57e-be30-4ed7-a386-6614373244c0!
I0309 19:19:14.406580       1 controller.go:1337] provision "default/data-mysql-operator-0" class "hcloud-volumes": started
I0309 19:19:14.406772       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-mysql-operator-0", UID:"625aafa5-0e6f-4826-9cef-5698ed2bd148", APIVersion:"v1", ResourceVersion:"118556527", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/data-mysql-operator-0"
I0309 19:19:18.419986       1 controller.go:1442] provision "default/data-mysql-operator-0" class "hcloud-volumes": volume "pvc-625aafa5-0e6f-4826-9cef-5698ed2bd148" provisioned
I0309 19:19:18.420010       1 controller.go:1455] provision "default/data-mysql-operator-0" class "hcloud-volumes": succeeded
I0309 19:19:18.448288       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-mysql-operator-0", UID:"625aafa5-0e6f-4826-9cef-5698ed2bd148", APIVersion:"v1", ResourceVersion:"118556527", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-625aafa5-0e6f-4826-9cef-5698ed2bd148

Additional information

It looks like the provison work and in the cloud console I can see the disk and it looks as it has no filesystem

mattiashem commented 5 months ago

[core@bastion storage]$ kubectl logs -f hcloud-csi-node-jcsdp -n kube-system -c hcloud-csi-driver level=info ts=2024-03-15T13:33:11.303902051Z msg="Fetched data from metadata service" id=41964065 location=nbg1 ^C [core@bastion storage]$ kubectl logs -f hcloud-csi-node-p6jj2 -n kube-system -c hcloud-csi-driver level=info ts=2024-03-15T13:33:09.161029164Z msg="Fetched data from metadata service" id=41963957 location=nbg1 level=error ts=2024-03-15T13:34:49.615777847Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100455302: disk /dev/disk/by-id/scsi-0HC_Volume_100455302 propably contains partitions"

fallenby-klar commented 4 months ago

I'm also running into this issue.

My manifest:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: task-pv-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: task-pv-pod
spec:
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
        claimName: task-pv-claim
  containers:
    - name: task-pv-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: task-pv-storage

The error:

MountVolume.SetUp failed for volume "pvc-15b2696c-eaf0-4e79-97de-71903a597ebb" : rpc error: code = Internal desc = failed to publish volume: unable to detect existing disk format of /dev/disk/by-id/scsi-0HC_Volume_100612315: disk /dev/disk/by-id/scsi-0HC_Volume_100612315 propably contains partitions
apricote commented 4 months ago

I am unable to reproduce this with our dev setup and the Getting Started guide.

Some questions that might help to pinpoint the issue:

mattiashem commented 4 months ago
apricote commented 4 months ago

I am still unable to reproduce this. What I have done:

  1. Setup a Cluster with Talos 1.7.0 using their docs I used Packer to create the snapshots. To generate the Talos Config I used this command instead, to get below the 32kb limit on userdata:

    talosctl gen config talos-k8s-hcloud-tutorial https://$LOAD_BALANCER_IP:6443 --kubernetes-version 1.27.4 --with-docs=false --with-examples=false
  2. Followed the Steps from our Getting Started on Kubernetes guide, installing the Helm chart and applying my-csi-app.

And my-csi-app successfully started and the volume is mounted.

Do you have some steps for me to reproduce this?

mattiashem commented 3 months ago

It looks like I get the error from the API, or it bounds the volumes wrong. I only have the problem with an older cluster. If I create a new cluster (even on old talos like 1.6), it works.

Is there some API in the client ore how the volumes that are attached are different ? Have 2 cluster now on the same project work on one but not on the other ?

I have given up and move

kosh30 commented 2 months ago

Same issue

apricote commented 2 weeks ago

Some more questions:

uname -a
blkid -p -o export /dev/disk/by-id/scsi-0HC_Volume_xxxxxxxx

On Talos without a regular shell you should be able to exec into the csi-driver Node Pod to execute these commands.