minio / directpv

Kubernetes CSI driver for Direct Attached Storage :minidisc:
https://directpv.io
GNU Affero General Public License v3.0
593 stars 88 forks source link

Minio tenant does not schedule volumes on available disks #924

Closed qjoly closed 2 months ago

qjoly commented 2 months ago

Hi,

I've just noticed that the tenant often tries to allocate volumes where the available space isn't big enough.

Here are the values I use in Helm :

  tenant:
    name: "base-tenant"
    pools:
    - servers: 6
      name: first-pool
      size: 45G
      volumesPerServer: 1 # 4 volumes of 20G each =  80G per server
      storageClassName: directpv-min-io
      nodeSelector:
        node-category/direct-pv: "true"

And here are my nodes :

➜  ~ k directpv list drives
k directpv list drives
┌──────────┬──────┬──────┬────────┬─────────┬─────────┬────────┐
│ NODE     │ NAME │ MAKE │ SIZE   │ FREE    │ VOLUMES │ STATUS │
├──────────┼──────┼──────┼────────┼─────────┼─────────┼────────┤
│ worker-1 │ vde  │ -    │ 50 GiB │ 50 GiB  │ 0       │ Ready  │
│ worker-1 │ vdf  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-2 │ vdb  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-2 │ vdc  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-3 │ vdb  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-3 │ vdc  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
└──────────┴──────┴──────┴────────┴─────────┴─────────┴────────┘

These nodes are the only ones with the label node-category/direct-pv=true. Most volumes are well scheduled, but I always have one or two that are scheduled on the wrong disk/node (notice that the worker-1 on disk vde has no volume)

Expected Behavior

The volumes should add up correctly, and my tenant should have more storage. The distribution of volumes on disks should be similar to this :

➜  ~ k directpv list drives
k directpv list drives
┌──────────┬──────┬──────┬────────┬─────────┬─────────┬────────┐
│ NODE     │ NAME │ MAKE │ SIZE   │ FREE    │ VOLUMES │ STATUS │
├──────────┼──────┼──────┼────────┼─────────┼─────────┼────────┤
│ worker-1 │ vde  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-1 │ vdf  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-2 │ vdb  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-2 │ vdc  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-3 │ vdb  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-3 │ vdc  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
└──────────┴──────┴──────┴────────┴─────────┴─────────┴────────┘

Current Behavior

But the placement is as follows:

➜  ~ k directpv list drives
k directpv list drives
┌──────────┬──────┬──────┬────────┬─────────┬─────────┬────────┐
│ NODE     │ NAME │ MAKE │ SIZE   │ FREE    │ VOLUMES │ STATUS │
├──────────┼──────┼──────┼────────┼─────────┼─────────┼────────┤
│ worker-1 │ vde  │ -    │ 50 GiB │ 50 GiB  │ -       │ Ready  │
│ worker-1 │ vdf  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-2 │ vdb  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-2 │ vdc  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-3 │ vdb  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
│ worker-3 │ vdc  │ -    │ 50 GiB │ 8.1 GiB │ 1       │ Ready  │
└──────────┴──────┴──────┴────────┴─────────┴─────────┴────────┘

DirectPV doesn't reschedule volumes and the tenant is totally crashed (it try to schedule on worker-3, but this node can't accept another volume of this size)

directpv-min-io_controller-85c45575fd-tcl7n_73e474b8-a543-464f-aecb-efaf404dd0b1  failed to provision volume with StorageClass "directpv-min-io": rpc error: code = ResourceExhausted desc = no drive found for requested topology; requested node(s): worker-3; requested size: 45000000000 bytes

Your Environment

Thank you in advance for your answer

qjoly commented 2 months ago

I tried to search inside logs of csi-provisionner (volume size may be different, I tried with multiple server)

csi-provisioner I0807 13:58:52.697591       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"minio-config", Name:"data1-base-tenant-first-pool-0", UID:"d200653d-d987-40e4-bc0e-393ea9afd8e5", APIVersion:"v1", ResourceVers
ion:"61154", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "directpv-min-io": rpc error: code = ResourceExhausted desc = no drive found for requested topology; requested node(s): worker-3; requ
ested size: 20000000000 bytes
csi-provisioner I0807 13:58:52.703413       1 connection.go:200] GRPC response: {}
csi-provisioner I0807 13:58:52.703488       1 connection.go:201] GRPC error: rpc error: code = ResourceExhausted desc = no drive found for requested topology; requested node(s): worker-3; requested size: 20000000000 bytes
csi-provisioner I0807 13:58:52.703504       1 controller.go:816] CreateVolume failed, supports topology = true, node selected true => may reschedule = true => state = Reschedule: rpc error: code = ResourceExhausted desc = no drive found for requested to
pology; requested node(s): worker-3; requested size: 20000000000 bytes
csi-provisioner I0807 13:58:52.703635       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"minio-config", Name:"data3-base-tenant-first-pool-0", UID:"69dbac03-5313-4ff2-a5f9-d2c72217a969", APIVersion:"v1", ResourceVers
ion:"61155", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "directpv-min-io": rpc error: code = ResourceExhausted desc = no drive found for requested topology; requested node(s): worker-3; requ
ested size: 20000000000 bytes
csi-provisioner I0807 13:58:52.717125       1 controller.go:1484] provision "minio-config/data1-base-tenant-first-pool-0" class "directpv-min-io": volume rescheduled because: failed to provision volume with StorageClass "directpv-min-io": rpc error: cod
e = ResourceExhausted desc = no drive found for requested topology; requested node(s): worker-3; requested size: 20000000000 bytes
csi-provisioner I0807 13:58:52.717151       1 controller.go:1071] Stop provisioning, removing PVC d200653d-d987-40e4-bc0e-393ea9afd8e5 from claims in progress
csi-provisioner I0807 13:58:52.718191       1 controller.go:1484] provision "minio-config/data3-base-tenant-first-pool-0" class "directpv-min-io": volume rescheduled because: failed to provision volume with StorageClass "directpv-min-io": rpc error: cod
e = ResourceExhausted desc = no drive found for requested topology; requested node(s): worker-3; requested size: 20000000000 bytes
csi-provisioner I0807 13:58:52.718209       1 controller.go:1071] Stop provisioning, removing PVC 69dbac03-5313-4ff2-a5f9-d2c72217a969 from claims in progress
csi-provisioner I0807 13:58:57.095392       1 leaderelection.go:276] successfully renewed lease directpv/directpv-min-io
csi-provisioner I0807 13:59:02.108332       1 leaderelection.go:276] successfully renewed lease directpv/directpv-min-io
csi-provisioner I0807 13:59:04.689902       1 controller.go:1359] provision "minio-config/data1-base-tenant-first-pool-0" class "directpv-min-io": started
csi-provisioner I0807 13:59:04.690131       1 connection.go:193] GRPC call: /csi.v1.Controller/CreateVolume
csi-provisioner I0807 13:59:04.690144       1 connection.go:194] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"directpv.min.io/identity":"directpv-min-io","directpv.min.io/node":"worker-3","directpv.min.io/rack":"default","dire
ctpv.min.io/region":"default","directpv.min.io/zone":"default"}}],"requisite":[{"segments":{"directpv.min.io/identity":"directpv-min-io","directpv.min.io/node":"worker-3","directpv.min.io/rack":"default","directpv.min.io/region":"default","directpv.min.
io/zone":"default"}}]},"capacity_range":{"required_bytes":20000000000},"name":"pvc-d200653d-d987-40e4-bc0e-393ea9afd8e5","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}}]}
csi-provisioner I0807 13:59:04.690573       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"minio-config", Name:"data1-base-tenant-first-pool-0", UID:"d200653d-d987-40e4-bc0e-393ea9afd8e5", APIVersion:"v1", ResourceVers
ion:"61206", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "minio-config/data1-base-tenant-first-pool-0"
csi-provisioner I0807 13:59:04.696827       1 controller.go:1359] provision "minio-config/data3-base-tenant-first-pool-0" class "directpv-min-io": started
csi-provisioner I0807 13:59:04.696911       1 connection.go:193] GRPC call: /csi.v1.Controller/CreateVolume

I don't know if that helps or not.

balamurugana commented 2 months ago

Volume scheduling works with many parameters. The default selection is by free drive space with request topology. You would need to provide more information like (kubectl get pvc -o yaml, kubectl get directpvdrives -o yaml and kubectl get directpvvolumes -o yaml) and logs of controller/node-server container from controller/node-server pods.

Refer https://github.com/minio/directpv/blob/master/docs/volume-scheduling.md for how you control volume scheduling.

qjoly commented 2 months ago

After a complete reinstallation of my cluster, I can no longer reproduce the problem.

I don't know for sure, but I think I should have deleted the directpv content on the nodes. (I've tried to reproduce the same environment, but I can't get this behavior anymore).