piraeusdatastore / linstor-csi

CSI plugin for LINSTOR
Apache License 2.0
109 stars 27 forks source link

pvc dont mount to pod #197

Open VadimkP opened 1 year ago

VadimkP commented 1 year ago

Hi.

I deploy linstor-csi-1.19 (try 1.17 too) in my k8s cluster Add workers in linstor cluster

root@pve-3:~# linstor n l
╭────────────────────────────────────────────────────────────────╮
┊ Node        ┊ NodeType  ┊ Addresses                   ┊ State  ┊
╞════════════════════════════════════════════════════════════════╡
┊ aisk8swr01t ┊ SATELLITE ┊ 192.168.129.43:3366 (PLAIN) ┊ Online ┊
┊ aisk8swr02t ┊ SATELLITE ┊ 192.168.129.44:3366 (PLAIN) ┊ Online ┊
┊ aisk8swr03t ┊ SATELLITE ┊ 192.168.129.45:3366 (PLAIN) ┊ Online ┊
┊ pve-1       ┊ SATELLITE ┊ 192.168.129.4:3366 (PLAIN)  ┊ Online ┊
┊ pve-2       ┊ SATELLITE ┊ 192.168.129.5:3366 (PLAIN)  ┊ Online ┊
┊ pve-3       ┊ SATELLITE ┊ 192.168.129.6:3366 (PLAIN)  ┊ Online ┊
╰────────────────────────────────────────────────────────────────╯

I create storage-class and pvc PVC is created, then I see 'linstor v l' and pvc has status upToDate

root@pve-3:~# linstor v l -r pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node  ┊ Resource                                 ┊ StoragePool  ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pve-1 ┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ drbdhddpool1 ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   244 KiB ┊ Unused ┊ UpToDate ┊
┊ pve-2 ┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ drbdhddpool1 ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   279 KiB ┊ Unused ┊ UpToDate ┊
┊ pve-3 ┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ drbdhddpool1 ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   110 KiB ┊ Unused ┊ UpToDate ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

BUT, i create pod with pvc and i have a problem, In pods logs i see the following

MountVolume.SetUp failed for volume "pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1" : rpc error: code = Internal desc = NodePublishVolume failed for pvc-24f506d3-7ef8-4219-901b-801cab7f6dda: failed to stat source device: stat : no such file or directory

In second pod

Multi-Attach error for volume "pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1" Volume is already used by pod(s) nginx-test-1-7c8699d678-972rn

But I also tried to create a pod with 1 replication, the same error with mounting

Status also changed in linstor

root@pve-3:~# linstor v l -r pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node        ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ aisk8swr01t ┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ DfltDisklessStorPool ┊     0 ┊    1033 ┊ None          ┊           ┊ Unused ┊ Diskless ┊
┊ pve-1       ┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ drbdhddpool1         ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   261 KiB ┊ Unused ┊ UpToDate ┊
┊ pve-2       ┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ drbdhddpool1         ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   279 KiB ┊ Unused ┊ UpToDate ┊
┊ pve-3       ┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ drbdhddpool1         ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   261 KiB ┊ Unused ┊ UpToDate ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@pve-3:~# drbdsetup status pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1
pvc-773e2201-2ef4-436e-a464-b84dd048e84e role:Secondary
  disk:UpToDate
  aisk8swr01t connection:Connecting
  pve-1 role:Secondary
    peer-disk:UpToDate
  pve-2 role:Secondary
    peer-disk:UpToDate

root@pve-3:~# linstor r l -r pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node        ┊ Port ┊ Usage  ┊ Conns                   ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ aisk8swr01t ┊ 7033 ┊        ┊                         ┊  Unknown ┊                     ┊
┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ pve-1       ┊ 7033 ┊ Unused ┊ Connecting(aisk8swr01t) ┊ UpToDate ┊ 2023-05-02 15:11:21 ┊
┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ pve-2       ┊ 7033 ┊ Unused ┊ Connecting(aisk8swr01t) ┊ UpToDate ┊ 2023-05-02 15:11:01 ┊
┊ pvc-7815c841-361f-4f7d-8ca1-a7e2dd06fea1 ┊ pve-3       ┊ 7033 ┊ Unused ┊ Connecting(aisk8swr01t) ┊ UpToDate ┊ 2023-05-02 15:11:11 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

my manifests

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: k8s-linstor-hdd
provisioner: linstor.csi.linbit.com
allowVolumeExpansion: true
parameters:
  linstor.csi.linbit.com/placementCount: "3"
  linstor.csi.linbit.com/storagePool: drbdhddpool1
  linstor.csi.linbit.com/resourceGroup: drbdhdd1
  DrbdOptions/Disk/disk-flushes: "no"
  DrbdOptions/Disk/md-flushes: "no"
  DrbdOptions/Net/max-buffers: "8000"
  DrbdOptions/Net/max-epoch-size: "8000"
  DrbdOptions/Net/sndbuf-size: "2000000"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-test
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: k8s-linstor-hdd
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-test-1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx12
  template:
    metadata:
      labels:
        app: nginx12
    spec:
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: pvc-test
      containers:
        - image: nginx:1.12
          name: nginx
          ports:
            - containerPort: 80
          volumeMounts:
          - name: data
            mountPath: "/etc/nginx/conf.d/"
VadimkP commented 1 year ago

And another moment, if I do placementCount: "2" Then when creating pvc I see this

┊ pvc-47dde43d-3385-4810-b181-d9d498022e9a ┊ aisk8swr01t ┊ 7033 ┊        ┊                       ┊    Unknown ┊                     ┊
┊ pvc-47dde43d-3385-4810-b181-d9d498022e9a ┊ pve-1       ┊ 7033 ┊ Unused ┊                       ┊    Unknown ┊ 2023-05-03 10:34:07 ┊
┊ pvc-47dde43d-3385-4810-b181-d9d498022e9a ┊ pve-2       ┊ 7033 ┊ Unused ┊                       ┊    Unknown ┊ 2023-05-03 10:33:53 ┊
WanzenBug commented 1 year ago

Seems that DRBD is not set up correctly on the k8s nodes. Please check "linstor node info" and verify that DRBD is actually supported on the k8s nodes. You might also verify that DRBD 9 is loaded instead of 8.4 by checking /proc/drbd on the k8s nodes.

VadimkP commented 1 year ago
root@pve-3:~# linstor node info
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node        ┊ Diskless ┊ LVM ┊ LVMThin ┊ ZFS/Thin ┊ File/Thin ┊ SPDK ┊ EXOS ┊ Remote SPDK ┊ Storage Spaces ┊ Storage Spaces/Thin ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ aisk8swr01t ┊ +        ┊ +   ┊ +       ┊ -        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ aisk8swr02t ┊ +        ┊ +   ┊ +       ┊ -        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ aisk8swr03t ┊ +        ┊ +   ┊ +       ┊ -        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ pve-1       ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ pve-2       ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
┊ pve-3       ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ -    ┊ +           ┊ -              ┊ -                   ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭─────────────────────────────────────────────────────────────────────────────────────╮
┊ Node        ┊ DRBD ┊ LUKS ┊ NVMe ┊ Cache ┊ BCache ┊ WriteCache ┊ OpenFlex ┊ Storage ┊
╞═════════════════════════════════════════════════════════════════════════════════════╡
┊ aisk8swr01t ┊ +    ┊ +    ┊ -    ┊ +     ┊ +      ┊ +          ┊ -        ┊ +       ┊
┊ aisk8swr02t ┊ +    ┊ +    ┊ -    ┊ +     ┊ +      ┊ +          ┊ -        ┊ +       ┊
┊ aisk8swr03t ┊ +    ┊ +    ┊ -    ┊ +     ┊ +      ┊ +          ┊ -        ┊ +       ┊
┊ pve-1       ┊ +    ┊ +    ┊ +    ┊ +     ┊ +      ┊ +          ┊ +        ┊ +       ┊
┊ pve-2       ┊ +    ┊ +    ┊ +    ┊ +     ┊ +      ┊ +          ┊ +        ┊ +       ┊
┊ pve-3       ┊ +    ┊ +    ┊ +    ┊ +     ┊ +      ┊ +          ┊ +        ┊ +       ┊
╰─────────────────────────────────────────────────────────────────────────────────────╯

k8s node

root@aisk8swr01t:~# cat /proc/drbd
version: 9.2.3 (api:2/proto:86-122)
GIT-hash: c142ca1280c41aee1330b980544ef276330ff6ef build by root@aisk8swr01t, 2023-05-02 12:21:51
Transports (api:18): tcp (9.2.3)

VM and their disk deploy at servers with linstor cluster Will this scheme work?

WanzenBug commented 1 year ago

It should work, but something is blocking the LINSTOR Satellite on the k8s node from setting up the DRBD device. Are there any error reports related to the resource (linstor error-report list + linstor error-report show <id>)?

VadimkP commented 1 year ago
root@pve-3:~# linstor error-reports show 6450D7B2-00000-000010
ERROR REPORT 6450D7B2-00000-000010

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.22.0
Build ID:                           0dd2e7d8bad7b0115e924ef66371568320898285
Build time:                         2023-04-17T11:24:18+00:00
Error time:                         2023-05-03 10:38:47
Node:                               pve-3
Peer:                               RestClient(192.168.129.45; 'linstor-csi/v1.0.0-3fbda5a7e31517786796ae4834c159caa9ad8ac7')

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         ApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337

Error message:                      (Node: 'aisk8swr01t') Failed to adjust DRBD resource pvc-47dde43d-3385-4810-b181-d9d498022e9a

Error context:
    (Node: 'aisk8swr01t') Failed to adjust DRBD resource pvc-47dde43d-3385-4810-b181-d9d498022e9a

Asynchronous stage backtrace:

    Error has been observed at the following site(s):
        |_ checkpoint ⇢ Modify volume
    Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:337

Suppressed exception 1 of 1:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name:               reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337

Error message:
Error has been observed at the following site(s):
    |_ checkpoint ⇢ Modify volume
Stack trace:

Error context:
    (Node: 'aisk8swr01t') Failed to adjust DRBD resource pvc-47dde43d-3385-4810-b181-d9d498022e9a

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:337
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:284
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N      com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:388
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.
root@pve-3:~# linstor error-reports show 64510950-96EEE-000048
ERROR REPORT 64510950-96EEE-000048

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Satellite
Version:                            1.22.0
Build ID:                           0dd2e7d8bad7b0115e924ef66371568320898285
Build time:                         2023-04-17T11:24:18+00:00
Error time:                         2023-05-03 10:40:59
Node:                               aisk8swr01t

============================================================

Reported error:
===============

Description:
    Failed to adjust DRBD resource pvc-47dde43d-3385-4810-b181-d9d498022e9a

Category:                           LinStorException
Class name:                         ResourceException
Class canonical name:               com.linbit.linstor.core.devmgr.exceptions.ResourceException
Generated at:                       Method 'adjustDrbd', Source file 'DrbdLayer.java', Line #866

Error message:                      Failed to adjust DRBD resource pvc-47dde43d-3385-4810-b181-d9d498022e9a

Error context:
    An error occurred while processing resource 'Node: 'aisk8swr01t', Rsc: 'pvc-47dde43d-3385-4810-b181-d9d498022e9a''

ErrorContext:

Call backtrace:

    Method                                   Native Class:Line number
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:866
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:424
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:983
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:411
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:175
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:323
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1153
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:751
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:645
    run                                      N      java.lang.Thread:829

Caused by:
==========

Description:
    Execution of the external command 'drbdadm' failed.
Cause:
    The external command exited with error code 1.
Correction:
    - Check whether the external program is operating properly.
    - Check whether the command line is correct.
      Contact a system administrator or a developer if the command line is no longer valid
      for the installed version of the external program.
Additional information:
    The full command line executed was:
    drbdadm -vvv adjust pvc-47dde43d-3385-4810-b181-d9d498022e9a

    The external command sent the following output data:
    drbdsetup new-peer pvc-47dde43d-3385-4810-b181-d9d498022e9a 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=OqmI2d9rGXZyklVEWgIl --cram-hmac-alg=sha1

    The external command sent the following error information:
    pvc-47dde43d-3385-4810-b181-d9d498022e9a: Failure: (146) VERIFYAlgNotAvail
    additional info from kernel:
    failed to allocate crct10dif-pclmul for verify

    Command 'drbdsetup new-peer pvc-47dde43d-3385-4810-b181-d9d498022e9a 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=OqmI2d9rGXZyklVEWgIl --cram-hmac-alg=sha1' terminated with exit code 10
    drbdadm: new-peer pvc-47dde43d-3385-4810-b181-d9d498022e9a: skipped due to earlier error

Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #593

Error message:                      The external command 'drbdadm' exited with error code 1

ErrorContext:   Description: Execution of the external command 'drbdadm' failed.
  Cause:       The external command exited with error code 1.
  Correction:  - Check whether the external program is operating properly.
- Check whether the command line is correct.
  Contact a system administrator or a developer if the command line is no longer valid
  for the installed version of the external program.
  Details:     The full command line executed was:
drbdadm -vvv adjust pvc-47dde43d-3385-4810-b181-d9d498022e9a

The external command sent the following output data:
drbdsetup new-peer pvc-47dde43d-3385-4810-b181-d9d498022e9a 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=OqmI2d9rGXZyklVEWgIl --cram-hmac-alg=sha1

The external command sent the following error information:
pvc-47dde43d-3385-4810-b181-d9d498022e9a: Failure: (146) VERIFYAlgNotAvail
additional info from kernel:
failed to allocate crct10dif-pclmul for verify

Command 'drbdsetup new-peer pvc-47dde43d-3385-4810-b181-d9d498022e9a 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=OqmI2d9rGXZyklVEWgIl --cram-hmac-alg=sha1' terminated with exit code 10
drbdadm: new-peer pvc-47dde43d-3385-4810-b181-d9d498022e9a: skipped due to earlier error

Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:593
    adjust                                   N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:90
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:785
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:424
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:983
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:411
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:175
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:323
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1153
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:751
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:645
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.
WanzenBug commented 1 year ago

Seems like the issue is:

pvc-47dde43d-3385-4810-b181-d9d498022e9a: Failure: (146) VERIFYAlgNotAvail
additional info from kernel:
failed to allocate crct10dif-pclmul for verify

I believe LINSTOR should automatically select the right verify algorithm useable by all nodes. What kind of kernels / host OS are you using? Seems like you use very different versions that might not have any algorithms in common.

VadimkP commented 1 year ago

i upgrade kernel on k8s nodes to 5.15.104, the same version on linstor cluster But now drbd is not version 9

root@aisk8swr01t:~# modprobe drbd
root@aisk8swr01t:~# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 98E710E58B3041F3046305B
root@aisk8swr01t:~# apt list --installed | grep drbd
drbd-dkms/unknown,now 9.2.3-1 all [installed]
drbd-utils/unknown,now 9.23.1-1 amd64 [installed]
WanzenBug commented 1 year ago

You can try running dkms build drbd/9.2.3-1, but this should actually be done automatically when you install a new kernel :frowning_face:

VadimkP commented 1 year ago

I brought to the same version of the kernel and drbd on all nodes But still, when connecting to the pod to the pvc, I see

kubectl describe

MountVolume.SetUp failed for volume "pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385" : rpc error: code = Internal desc = NodePublishVolume failed for pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385: failed to stat source device: stat : no such file or directory
root@pve-3:/etc/apt# linstor v l -r pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node        ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ aisk8swr02t ┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ DfltDisklessStorPool ┊     0 ┊    1033 ┊ None          ┊           ┊ Unused ┊ Diskless ┊
┊ pve-1       ┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ drbdhddpool1         ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   261 KiB ┊ Unused ┊ UpToDate ┊
┊ pve-2       ┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ drbdhddpool1         ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊   279 KiB ┊ Unused ┊ UpToDate ┊
┊ pve-3       ┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ DfltDisklessStorPool ┊     0 ┊    1033 ┊ /dev/drbd1033 ┊           ┊        ┊  Unknown ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

root@pve-3:/etc/apt# linstor r l -r pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node        ┊ Port ┊ Usage  ┊ Conns                   ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ aisk8swr02t ┊ 7033 ┊ Unused ┊ Ok                      ┊ Diskless ┊                     ┊
┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ pve-1       ┊ 7033 ┊ Unused ┊ Connecting(aisk8swr02t) ┊ UpToDate ┊ 2023-05-03 12:24:18 ┊
┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ pve-2       ┊ 7033 ┊ Unused ┊ Connecting(aisk8swr02t) ┊ UpToDate ┊ 2023-05-03 12:23:39 ┊
┊ pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 ┊ pve-3       ┊ 7033 ┊        ┊ Ok                      ┊ DELETING ┊ 2023-05-03 12:23:40 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@pve-3:/etc/apt# linstor err s 645246FB-2D634-000000
ERROR REPORT 645246FB-2D634-000000

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Satellite
Version:                            1.22.0
Build ID:                           0dd2e7d8bad7b0115e924ef66371568320898285
Build time:                         2023-04-17T11:24:18+00:00
Error time:                         2023-05-03 14:37:21
Node:                               aisk8swr02t

============================================================

Reported error:
===============

Description:
    Failed to adjust DRBD resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385

Category:                           LinStorException
Class name:                         ResourceException
Class canonical name:               com.linbit.linstor.core.devmgr.exceptions.ResourceException
Generated at:                       Method 'adjustDrbd', Source file 'DrbdLayer.java', Line #866

Error message:                      Failed to adjust DRBD resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385

Error context:
    An error occurred while processing resource 'Node: 'aisk8swr02t', Rsc: 'pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385''

ErrorContext:

Call backtrace:

    Method                                   Native Class:Line number
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:866
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:424
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:983
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:411
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:175
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:323
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1153
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:751
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:645
    run                                      N      java.lang.Thread:829

Caused by:
==========

Description:
    Execution of the external command 'drbdadm' failed.
Cause:
    The external command exited with error code 1.
Correction:
    - Check whether the external program is operating properly.
    - Check whether the command line is correct.
      Contact a system administrator or a developer if the command line is no longer valid
      for the installed version of the external program.
Additional information:
    The full command line executed was:
    drbdadm -vvv adjust pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385

    The external command sent the following output data:
    drbdsetup new-resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 3 --on-no-quorum=io-error --quorum=majority
    drbdsetup new-minor pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 1033 0 --diskless
    drbdsetup new-peer pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=NFAOQsuUlJ8HyQMAeLQp --cram-hmac-alg=sha1

    The external command sent the following error information:
    New resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385
    New minor 1033 (vol:0)
    pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385: Failure: (146) VERIFYAlgNotAvail
    additional info from kernel:
    failed to allocate crct10dif-pclmul for verify

    Command 'drbdsetup new-peer pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=NFAOQsuUlJ8HyQMAeLQp --cram-hmac-alg=sha1' terminated with exit code 10
    drbdadm: new-peer pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385: skipped due to earlier error

Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #593

Error message:                      The external command 'drbdadm' exited with error code 1

ErrorContext:   Description: Execution of the external command 'drbdadm' failed.
  Cause:       The external command exited with error code 1.
  Correction:  - Check whether the external program is operating properly.
- Check whether the command line is correct.
  Contact a system administrator or a developer if the command line is no longer valid
  for the installed version of the external program.
  Details:     The full command line executed was:
drbdadm -vvv adjust pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385

The external command sent the following output data:
drbdsetup new-resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 3 --on-no-quorum=io-error --quorum=majority
drbdsetup new-minor pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 1033 0 --diskless
drbdsetup new-peer pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=NFAOQsuUlJ8HyQMAeLQp --cram-hmac-alg=sha1

The external command sent the following error information:
New resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385
New minor 1033 (vol:0)
pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385: Failure: (146) VERIFYAlgNotAvail
additional info from kernel:
failed to allocate crct10dif-pclmul for verify

Command 'drbdsetup new-peer pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385 1 --_name=pve-1 --verify-alg=crct10dif-pclmul --sndbuf-size=2000000 --rcvbuf-size=2097152 --max-epoch-size=8000 --max-buffers=8000 --shared-secret=NFAOQsuUlJ8HyQMAeLQp --cram-hmac-alg=sha1' terminated with exit code 10
drbdadm: new-peer pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385: skipped due to earlier error

Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:593
    adjust                                   N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:90
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:785
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:424
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:983
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:411
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:175
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:323
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1153
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:751
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:645
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.
root@pve-3:/etc/apt# linstor err s 6450D7B2-00000-000093
ERROR REPORT 6450D7B2-00000-000093

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.22.0
Build ID:                           0dd2e7d8bad7b0115e924ef66371568320898285
Build time:                         2023-04-17T11:24:18+00:00
Error time:                         2023-05-03 14:37:22
Node:                               pve-3
Peer:                               RestClient(192.168.129.45; 'linstor-csi/v1.0.0-3fbda5a7e31517786796ae4834c159caa9ad8ac7')

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         ApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337

Error message:                      (Node: 'aisk8swr02t') Failed to adjust DRBD resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385

Error context:
    (Node: 'aisk8swr02t') Failed to adjust DRBD resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385

Asynchronous stage backtrace:

    Error has been observed at the following site(s):
        |_ checkpoint ⇢ Modify volume
    Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:337

Suppressed exception 1 of 1:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name:               reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337

Error message:
Error has been observed at the following site(s):
    |_ checkpoint ⇢ Modify volume
Stack trace:

Error context:
    (Node: 'aisk8swr02t') Failed to adjust DRBD resource pvc-3f8cb1b6-a1d6-4bb9-9bc2-dcd46fd87385

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:337
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:284
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N      com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:388
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.
WanzenBug commented 1 year ago

This error might be of interest for https://github.com/linbit/linstor-server