vultr / vultr-csi

Container Storage Interface (CSI) Driver for Vultr Block Storage
Apache License 2.0
40 stars 17 forks source link

[BUG] - unix /var/lib/kubelet/plugins/block.csi.vultr.com/csi.sock not accessible (for rook.io) #88

Open defaultbranch opened 2 years ago

defaultbranch commented 2 years ago

Describe the bug Using rook.io, the rook pods rook-ceph-osd-prepare- fails to setup a PersistentVolumeClaim.

"describe pod" finally reports the event (warning) "MapVolume.SetUpDevice failed for volume "pvc-c869a0057b0c4904" : kubernetes.io/csi: blockMapper.stageVolumeForBlock failed to check STAGE_UNSTAGE_VOLUME capability: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/block.csi.vultr.com/csi.sock: connect: connection refused"" from the kubelet.

To Reproduce Steps to reproduce the behavior: (NOTE: This setup works fine on Azure AKS, with only the storageClassName adjusted.)

  1. Create a fresh kubernetes cluster (probably 1 worker node is sufficient for reproduction)

  2. For basic Rook setup: From the files in https://github.com/rook/rook/tree/master/deploy/examples, run kubectl apply -f for crds.yaml, common.yaml and operator.yaml, this creates CRDs, Roles and a rook-ceph-operator deployment/pod

  3. Run kubectl apply -f for the following CephCluster yaml:

    apiVersion: ceph.rook.io/v1
    kind: CephCluster
    metadata:
    name: rook-ceph
    namespace: rook-ceph
    spec:
    cephVersion:
    # NOTE: see cluster.yaml in <https://github.com/rook/rook.git> for up-to-date image version 
    image: quay.io/ceph/ceph:v17.2.1
    allowUnsupported: false
    dataDirHostPath: /var/lib/rook
    skipUpgradeChecks: false
    continueUpgradeAfterChecksEvenIfNotHealthy: false
    waitTimeoutForHealthyOSDInMinutes: 10
    mon:
    count: 3
    allowMultiplePerNode: false
    mgr:
    count: 2
    allowMultiplePerNode: false
    modules:
      - name: pg_autoscaler
        enabled: true
    dashboard:
    enabled: true
    ssl: true
    storage:
    storageClassDeviceSets:
    - name: set1
      # NOTE: change this to the number of nodes that should host an OSD
      count: 1
      portable: false
      tuneDeviceClass: false
      encrypted: false
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          storageClassName: vultr-block-storage-hdd
          accessModes:
            - ReadWriteOnce
          # NOTE: rook seems to expect a raw, unmounted device "volumeMode: Block"
          volumeMode: Block
          resources:
            requests:
              storage: 40Gi
  4. See events in kubectl -n rook-ceph describe pod rook-ceph-osd-prepare-* after the pod(s) got stuck.

Expected behavior The pods rook-ceph-osd-prepare-* should disappear after a short time, and instead corresponding rook-ceph-osd-* pods (without -prepare-) should remain.

Additional context I was using VKE with Kubernetes 1.23.x.

ddymko commented 2 years ago

@defaultbranch

NOTE: rook seems to expect a raw, unmounted device "volumeMode: Block"

volumeMode: Block

IIRC the vultr-csi doesn't support a raw unmounted device

https://github.com/vultr/vultr-csi/blob/master/driver/mounter.go#L113

kaznak commented 8 months ago

Hi, I have encountered the same situation. I have a question about this matter.

  1. do you plan to support volumeMode: Block in the future?
  2. is it possible to support volumeMode: Block by changing this CSI implementation?

Thank you in advance.

cuppett commented 8 months ago

I have a question about this matter.

  1. do you plan to support volumeMode: Block in the future?
  2. is it possible to support volumeMode: Block by changing this CSI implementation?

Unofficially: I'd expect this should be possible (with changes). When attaching block devices to vultr instances normally, you get the raw device and can do stuff to them (create LVM volumes, basic filesystems, use LUKS, etc).