piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
410 stars 64 forks source link

StorageException: Failed to mkfs /dev/drbd1002 #641

Open dmrub opened 7 months ago

dmrub commented 7 months ago

After installing piraeus-operator I get the error message StorageException: Failed to mkfs /dev/drbd1002 . Kubernetes version: v1.28.8 Priaeus operator: v2.3.0 Piraeus server: v1.25.1 Linstor is installed with the following satellite configuration

---    
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: linstor-fast
spec:
  internalTLS:
    certManager:
      name: linstor-internal-ca
      kind: Issuer
  storagePools:
    - name: vg01-linstor
      lvmThinPool:
        volumeGroup: vg01
        thinPool: linstor

After installation I get a number of errors:

$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Id                    ┊ Datetime            ┊ Node                                  ┊ Exception                                                                      ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ 6610156F-8EC88-000000 ┊ 2024-04-05 15:15:30 ┊ S|k8s-m2                              ┊ StorageException: Failed to mkfs /dev/drbd1002                                 ┊
┊ 66101520-00000-000000 ┊ 2024-04-05 15:15:32 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101520-00000-000001 ┊ 2024-04-05 15:15:35 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101520-00000-000002 ┊ 2024-04-05 15:15:42 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101589-E5863-000000 ┊ 2024-04-05 15:15:52 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000001 ┊ 2024-04-05 15:15:52 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101520-00000-000003 ┊ 2024-04-05 15:15:52 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 66101589-E5863-000001 ┊ 2024-04-05 15:15:59 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000002 ┊ 2024-04-05 15:16:04 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000002 ┊ 2024-04-05 15:16:08 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000003 ┊ 2024-04-05 15:16:08 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101520-00000-000004 ┊ 2024-04-05 15:16:09 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: (Node: 'k8s-m2') Generated resource file for resource 'pv...   ┊
┊ 66101520-00000-000005 ┊ 2024-04-05 15:16:09 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no...   ┊
┊ 6610156F-8EC88-000004 ┊ 2024-04-05 15:16:12 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000003 ┊ 2024-04-05 15:16:12 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000005 ┊ 2024-04-05 15:16:24 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000006 ┊ 2024-04-05 15:16:43 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000004 ┊ 2024-04-05 15:16:43 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000007 ┊ 2024-04-05 15:16:44 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000008 ┊ 2024-04-05 15:17:42 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000005 ┊ 2024-04-05 15:17:42 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 661015A1-A3732-000000 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m1                              ┊ SSLException: closing inbound before receiving peer's close_notify             ┊
┊ 6610156F-8EC88-000009 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m2                              ┊ SSLException: closing inbound before receiving peer's close_notify             ┊
┊ 661015A1-A3732-000001 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m1                              ┊ SSLException: closing inbound before receiving peer's close_notify             ┊
┊ 66101589-E5863-000006 ┊ 2024-04-05 15:19:00 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000010 ┊ 2024-04-05 15:19:00 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000011 ┊ 2024-04-05 15:19:01 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000007 ┊ 2024-04-05 15:19:01 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000012 ┊ 2024-04-05 15:19:27 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000008 ┊ 2024-04-05 15:19:27 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000013 ┊ 2024-04-05 15:19:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000009 ┊ 2024-04-05 15:19:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000010 ┊ 2024-04-05 15:20:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000014 ┊ 2024-04-05 15:20:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000011 ┊ 2024-04-05 15:22:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000015 ┊ 2024-04-05 15:22:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000016 ┊ 2024-04-05 15:23:14 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000017 ┊ 2024-04-05 15:27:58 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000012 ┊ 2024-04-05 15:27:58 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000018 ┊ 2024-04-05 15:37:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000013 ┊ 2024-04-05 15:37:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000014 ┊ 2024-04-05 16:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000019 ┊ 2024-04-05 16:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000020 ┊ 2024-04-05 17:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000015 ┊ 2024-04-05 17:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000021 ┊ 2024-04-05 21:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000016 ┊ 2024-04-05 21:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000022 ┊ 2024-04-06 21:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000017 ┊ 2024-04-06 21:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000018 ┊ 2024-04-07 21:07:57 ┊ S|k8s-m0                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000023 ┊ 2024-04-07 21:07:57 ┊ S|k8s-m2                              ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Here are the error reports:

  1. StorageException: Failed to mkfs /dev/drbd1002
    
    ERROR REPORT 6610156F-8EC88-000000

============================================================

Application: LINBIT�� LINSTOR Module: Satellite Version: 1.25.1 Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb Build time: 2023-11-20T10:09:08+00:00 Error time: 2024-04-05 15:15:30 Node: k8s-m2

============================================================

Reported error:

Description: Failed to mkfs /dev/drbd1002 Additional information: Command 'mkfs.ext4 -q -E nodiscard /dev/drbd1002' returned with exitcode 1.

Standard out: 

Error message: 
The file /dev/drbd1002 does not exist and no size was specified.

Category: LinStorException Class name: StorageException Class canonical name: com.linbit.linstor.storage.StorageException Generated at: Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line #69

Error message: Failed to mkfs /dev/drbd1002

Error context: An error occurred while processing resource 'Node: 'k8s-m2', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''

ErrorContext: Details: Command 'mkfs.ext4 -q -E nodiscard /dev/drbd1002' returned with exitcode 1.

Standard out:

Error message: The file /dev/drbd1002 does not exist and no size was specified.

Call backtrace:

Method                                   Native Class:Line number
checkExitCode                            N      com.linbit.extproc.ExtCmdUtils:69
genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:103
genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:63
genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:51
makeFs                                   N      com.linbit.linstor.layer.storage.utils.MkfsUtils:96
makeExt4                                 N      com.linbit.linstor.layer.storage.utils.MkfsUtils:109
makeFileSystemOnMarked                   N      com.linbit.linstor.layer.storage.utils.MkfsUtils:222
condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1771
adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:889
process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run                                      N      java.lang.Thread:829

END OF ERROR REPORT.

2. `Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.`

$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports show 66101520-00000-000000 ERROR REPORT 66101520-00000-000000

============================================================

Application: LINBIT�� LINSTOR Module: Controller Version: 1.25.1 Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb Build time: 2023-11-20T10:09:08+00:00 Error time: 2024-04-05 15:15:32 Node: linstor-controller-5f594b5b45-9lr8z Peer: RestClient(10.244.42.135; 'linstor-csi/v1.3.0-4077ebefbe439ee2894b782aa7914b590891d2ff')

============================================================

Reported error:

Category: RuntimeException Class name: ApiRcException Class canonical name: com.linbit.linstor.core.apicallhandler.response.ApiRcException Generated at: Method 'deleteVolumeDefinitionInTransaction', Source file 'CtrlVlmDfnDeleteApiCallHandler.java', Line #179

Error message: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Error context: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Asynchronous stage backtrace:

Error has been observed at the following site(s):
    *__checkpoint ? Delete volume definition
Original Stack Trace:

Call backtrace:

Method                                   Native Class:Line number
deleteVolumeDefinitionInTransaction      N      com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:179

Suppressed exception 1 of 1:

Category: RuntimeException Class name: OnAssemblyException Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException Generated at: Method 'deleteVolumeDefinitionInTransaction', Source file 'CtrlVlmDfnDeleteApiCallHandler.java', Line #179

Error message:
Error has been observed at the following site(s): *__checkpoint ��� Delete volume definition Original Stack Trace:

Error context: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Call backtrace:

Method                                   Native Class:Line number
deleteVolumeDefinitionInTransaction      N      com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:179
lambda$deleteVolumeDefinition$0          N      com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:134
doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:149
lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:76
call                                     N      reactor.core.publisher.MonoCallable:72
trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:127
subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
subscribe                                N      reactor.core.publisher.Flux:8759
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
subscribe                                N      reactor.core.publisher.MonoJust:55
subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
subscribe                                N      reactor.core.publisher.Flux:8773
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
subscribe                                N      reactor.core.publisher.MonoJust:55
subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
subscribe                                N      reactor.core.publisher.Mono:4495
subscribeWith                            N      reactor.core.publisher.Mono:4561
subscribe                                N      reactor.core.publisher.Mono:4462
subscribe                                N      reactor.core.publisher.Mono:4398
subscribe                                N      reactor.core.publisher.Mono:4370
doFlux                                   N      com.linbit.linstor.api.rest.v1.RequestHelper:324
deleteVolumeDefinition                   N      com.linbit.linstor.api.rest.v1.VolumeDefinitions:229
invoke0                                  Y      jdk.internal.reflect.NativeMethodAccessorImpl:unknown
invoke                                   N      jdk.internal.reflect.NativeMethodAccessorImpl:62
invoke                                   N      jdk.internal.reflect.DelegatingMethodAccessorImpl:43
invoke                                   N      java.lang.reflect.Method:566
lambda$static$0                          N      org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory:52
run                                      N      org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1:146
invoke                                   N      org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher:189
doDispatch                               N      org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker:159
dispatch                                 N      org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher:93
invoke                                   N      org.glassfish.jersey.server.model.ResourceMethodInvoker:478
apply                                    N      org.glassfish.jersey.server.model.ResourceMethodInvoker:400
apply                                    N      org.glassfish.jersey.server.model.ResourceMethodInvoker:81
run                                      N      org.glassfish.jersey.server.ServerRuntime$1:256
call                                     N      org.glassfish.jersey.internal.Errors$1:248
call                                     N      org.glassfish.jersey.internal.Errors$1:244
process                                  N      org.glassfish.jersey.internal.Errors:292
process                                  N      org.glassfish.jersey.internal.Errors:274
process                                  N      org.glassfish.jersey.internal.Errors:244
runInScope                               N      org.glassfish.jersey.process.internal.RequestScope:265
process                                  N      org.glassfish.jersey.server.ServerRuntime:235
handle                                   N      org.glassfish.jersey.server.ApplicationHandler:684
service                                  N      org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer:356
run                                      N      org.glassfish.grizzly.http.server.HttpHandler$1:190
doWork                                   N      org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker:535
run                                      N      org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker:515
run                                      N      java.lang.Thread:829

END OF ERROR REPORT.

3. `Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.`

$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports show 66101589-E5863-000000 ERROR REPORT 66101589-E5863-000000

============================================================

Application: LINBIT�� LINSTOR Module: Satellite Version: 1.25.1 Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb Build time: 2023-11-20T10:09:08+00:00 Error time: 2024-04-05 15:15:52 Node: k8s-m0

============================================================

Reported error:

Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted Cause: Verification of resource file failed Additional information: The error reported by the runtime environment or operating system is: The external command 'drbdadm' exited with error code 10

Category: LinStorException Class name: StorageException Class canonical name: com.linbit.linstor.storage.StorageException Generated at: Method 'regenerateResFile', Source file 'DrbdLayer.java', Line #1624

Error message: Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Error context: An error occurred while processing resource 'Node: 'k8s-m0', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''

ErrorContext: Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted Cause: Verification of resource file failed Details: The error reported by the runtime environment or operating system is: The external command 'drbdadm' exited with error code 10

Call backtrace:

Method                                   Native Class:Line number
regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1624
adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run                                      N      java.lang.Thread:829

Caused by:

Description: Execution of the external command 'drbdadm' failed. Cause: The external command exited with error code 10. Correction:

Category: LinStorException Class name: ExtCmdFailedException Class canonical name: com.linbit.extproc.ExtCmdFailedException Generated at: Method 'execute', Source file 'DrbdAdm.java', Line #642

Error message: The external command 'drbdadm' exited with error code 10

ErrorContext: Description: Execution of the external command 'drbdadm' failed. Cause: The external command exited with error code 10. Correction: - Check whether the external program is operating properly.

The external command sent the following output data:

The external command sent the following error information: /etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m0 { ... }: volume 0 not defined on k8s-m2 command sh-nop exited with code 10

Call backtrace:

Method                                   Native Class:Line number
execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:642
execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:625
checkResFile                             N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:492
regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1617
adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run                                      N      java.lang.Thread:829

END OF ERROR REPORT.

4. `(Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.`

ERROR REPORT 66101520-00000-000004

============================================================

Application: LINBIT�� LINSTOR Module: Controller Version: 1.25.1 Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb Build time: 2023-11-20T10:09:08+00:00 Error time: 2024-04-05 15:16:09 Node: linstor-controller-5f594b5b45-9lr8z Peer: RestClient(10.244.42.135; 'linstor-csi/v1.3.0-4077ebefbe439ee2894b782aa7914b590891d2ff')

============================================================

Reported error:

Category: RuntimeException Class name: ApiRcException Class canonical name: com.linbit.linstor.core.apicallhandler.response.ApiRcException Generated at: Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #346

Error message: (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Error context: (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Asynchronous stage backtrace:

Error has been observed at the following site(s):
    *__checkpoint ? Modify resource-definition
Original Stack Trace:

Call backtrace:

Method                                   Native Class:Line number
handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:346

Suppressed exception 1 of 1:

Category: RuntimeException Class name: OnAssemblyException Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException Generated at: Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #346

Error message:
Error has been observed at the following site(s): *__checkpoint ��� Modify resource-definition Original Stack Trace:

Error context: (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Call backtrace:

Method                                   Native Class:Line number
handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:346
handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:293
doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:244
lambda$doProcessMessage$4                N      com.linbit.linstor.proto.CommonMessageProcessor:229
subscribe                                N      reactor.core.publisher.FluxDefer:46
subscribe                                N      reactor.core.publisher.Flux:8773
onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
drainFused                               N      reactor.core.publisher.SinkManyUnicast:319
drain                                    N      reactor.core.publisher.SinkManyUnicast:362
tryEmitNext                              N      reactor.core.publisher.SinkManyUnicast:237
tryEmitNext                              N      reactor.core.publisher.SinkManySerialized:100
processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:392
doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:227
lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:185
runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
call                                     N      reactor.core.scheduler.WorkerTask:84
call                                     N      reactor.core.scheduler.WorkerTask:37
run                                      N      java.util.concurrent.FutureTask:264
run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
run                                      N      java.lang.Thread:829

END OF ERROR REPORT.

5. `Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.`

ERROR REPORT 6610156F-8EC88-000004

============================================================

Application: LINBIT�� LINSTOR Module: Satellite Version: 1.25.1 Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb Build time: 2023-11-20T10:09:08+00:00 Error time: 2024-04-05 15:16:12 Node: k8s-m2

============================================================

Reported error:

Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted Cause: Verification of resource file failed Additional information: The error reported by the runtime environment or operating system is: The external command 'drbdadm' exited with error code 10

Category: LinStorException Class name: StorageException Class canonical name: com.linbit.linstor.storage.StorageException Generated at: Method 'regenerateResFile', Source file 'DrbdLayer.java', Line #1624

Error message: Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.

Error context: An error occurred while processing resource 'Node: 'k8s-m2', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''

ErrorContext: Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted Cause: Verification of resource file failed Details: The error reported by the runtime environment or operating system is: The external command 'drbdadm' exited with error code 10

Call backtrace:

Method                                   Native Class:Line number
regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1624
adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run                                      N      java.lang.Thread:829

Caused by:

Description: Execution of the external command 'drbdadm' failed. Cause: The external command exited with error code 10. Correction:

Category: LinStorException Class name: ExtCmdFailedException Class canonical name: com.linbit.extproc.ExtCmdFailedException Generated at: Method 'execute', Source file 'DrbdAdm.java', Line #642

Error message: The external command 'drbdadm' exited with error code 10

ErrorContext: Description: Execution of the external command 'drbdadm' failed. Cause: The external command exited with error code 10. Correction: - Check whether the external program is operating properly.

The external command sent the following output data:

The external command sent the following error information: /etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m2 { ... }: volume 0 missing (present on k8s-m0) command sh-nop exited with code 10

Call backtrace:

Method                                   Native Class:Line number
execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:642
execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:625
checkResFile                             N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:492
regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1617
adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:687
process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:432
process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run                                      N      java.lang.Thread:829

END OF ERROR REPORT.


Output of LVM's `pvs; vgs; lvs;`  on cluster nodes:

k8s-m0: PV VG Fmt Attr PSize PFree
/dev/sda2 vg00 lvm2 a-- <99,50g <49,50g /dev/sdb vg01 lvm2 a-- <50,00g 516,00m

VG #PV #LV #SN Attr VSize VFree
vg00 1 1 0 wz--n- <99,50g <49,50g vg01 1 2 0 wz--n- <50,00g 516,00m

LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root vg00 -wi-ao---- 50,00g
linstor vg01 twi-aotz-- 49,39g 0,01 10,44
pvc-80745669-9bf4-4776-9865-f6f419c57863_00000 vg01 Vwi-a-tz-- 10,00g linstor 0,01

k8s-m1: PV VG Fmt Attr PSize PFree
/dev/sda2 vg00 lvm2 a-- <99,50g <49,50g /dev/sdb vg01 lvm2 a-- <50,00g 516,00m

VG #PV #LV #SN Attr VSize VFree
vg00 1 1 0 wz--n- <99,50g <49,50g vg01 1 2 0 wz--n- <50,00g 516,00m

LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root vg00 -wi-ao---- 50,00g
linstor vg01 twi-aotz-- 49,39g 0,43 10,58
pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60_00000 vg01 Vwi-aotz-- 8,00g linstor 2,68

k8s-m2: PV VG Fmt Attr PSize PFree
/dev/sda2 vg00 lvm2 a-- <99,50g <49,50g /dev/sdb vg01 lvm2 a-- <50,00g 516,00m

VG #PV #LV #SN Attr VSize VFree
vg00 1 1 0 wz--n- <99,50g <49,50g vg01 1 3 0 wz--n- <50,00g 516,00m

LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root vg00 -wi-ao---- 50,00g
linstor vg01 twi-aotz-- 49,39g 0,83 10,70
pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe_00000 vg01 Vwi-aotz-- 5,00g linstor 2,91
pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60_00000 vg01 Vwi-aotz-- 8,00g linstor 3,28

WanzenBug commented 7 months ago

Please try to update to the latest version.

It also looks like this was not a fresh install? Otherwise, why would there be any resources?

This

    Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Looks like the resource (which already existed) is still in use somewhere. So someone has the still mounted or similar. Clean that up first (check the resource state linstor r l to find where it is "InUse" and see unmount it there).

dmrub commented 7 months ago

I will try to upgrade to the latest version, but this is a fresh install. We plan to use Linstor in production, but before that we are doing automated testing by installing fresh Kubernetes on three VMs and then via Flux CD piraeus operator. This installation was started on Friday evening and this morning I saw the installation status and found the errors I describe in this issue.

The output of the linstor r l:

$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor r l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m0 ┊ 7002 ┊        ┊       ┊    Unknown ┊                     ┊
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m2 ┊ 7002 ┊ InUse  ┊       ┊    Unknown ┊ 2024-04-05 15:15:27 ┊
┊ pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe ┊ k8s-m2 ┊ 7000 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:15:24 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m0 ┊ 7001 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2024-04-05 15:16:03 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m1 ┊ 7001 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:16:04 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m2 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:16:02 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The PVC pvc-80745669-9bf4-4776-9865-f6f419c57863 is used by the monitoring, which cannot start:

$ kubectl get pvc -A | grep pvc-80745669-9bf4-4776-9865-f6f419c57863
monitoring           kube-prometheus-stack-grafana         Bound    pvc-80745669-9bf4-4776-9865-f6f419c57863   10Gi       RWO            linstor-fast                 2d17h

$ kubectl get pods -n monitoring
NAME                                                       READY   STATUS     RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0          2/2     Running    0          35h
kube-prometheus-stack-grafana-9b8785fdd-m9nkm              0/3     Init:0/1   0          2d17h
kube-prometheus-stack-kube-state-metrics-776c898f6-qbjj9   1/1     Running    0          47h
kube-prometheus-stack-operator-696cbbfbfb-sql6s            1/1     Running    0          35h
kube-prometheus-stack-prometheus-node-exporter-d96g9       1/1     Running    0          2d17h
kube-prometheus-stack-prometheus-node-exporter-dcdh7       1/1     Running    0          2d17h
kube-prometheus-stack-prometheus-node-exporter-gfblh       1/1     Running    0          2d17h
prometheus-kube-prometheus-stack-prometheus-0              2/2     Running    0          35h
WanzenBug commented 7 months ago

So it looks like 6610156F-8EC88-000000 indicates that mkfs failed because DRBD was not set up correctly. But in 66101520-00000-000000 we can see that the resource is apparently in use. This does not make much sense. This would indicate that something is using keeping the resource in primary without any actual disk.

Could you please try to run:

kubectl exec k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
kubectl exec k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863

It looks like the CSI driver later tried to create the volume again and somehow determined that the volume already exists, which lead to it being bound. I would recommend deleting the PVC and PV and letting it be recreated.

dmrub commented 7 months ago

Here is output of the commands

$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
pvc-80745669-9bf4-4776-9865-f6f419c57863 role:Primary

$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863
resource "pvc-80745669-9bf4-4776-9865-f6f419c57863" {
    options {
        on-no-data-accessible   suspend-io;
        on-suspended-primary-outdated   force-secondary;
    }
    _this_host {
        node-id         0;
    }
}
WanzenBug commented 7 months ago

Ok, this looks like a bug in LINSTOR that does not properly restore the resource to secondary after the mkfs call fails. Still leaves the issue how it can be that /dev/drbd1002 does not exist at this point. I have no idea how that can happen.

To fully clean up the volume:

kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup secondary pvc-80745669-9bf4-4776-9865-f6f419c57863

Then, run linstor rd d pvc-80745669-9bf4-4776-9865-f6f419c57863 and delete PVC and PV.

dmrub commented 7 months ago

Your last suggestion worked, I was able to reinstall the monitoring. What would you recommend now? Update to the latest version of piraeus Operator and create a new issue when I get a new error? What steps would help you to analyze this error?

WanzenBug commented 7 months ago

Yes, please upgrade and see if it happens again. In case you encounter an issue, run

kubectl exec -it deploy/linstor-controller -- linstor sos-report create

Then copy the created file from the pod to your host and attach it to the issue

dmrub commented 7 months ago

@WanzenBug , I am currently testing the latest version of Piraeus Operator v2.5.0 and so far the problem described in this issue has not reoccurred. However, I have just reproduced again a problem that I described in another issue: https://github.com/LINBIT/linstor-server/issues/396 . Since I never got a response in the linstor-server project, should I recreate the issue in this (piraeus-operator) project?

WanzenBug commented 7 months ago

Yes, this is an issue more appropriate for the piraeus project.