DivceName is None and no known workaround works

piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.

https://piraeus.io/

Apache License 2.0

383 stars 60 forks source link

DivceName is None and no known workaround works #287

Closed tobg closed 2 years ago

tobg commented 2 years ago

With Operator 1.7.1 and 1.8.0 we have DeviceName None for linstor v l

Restart Node (several times) does not work
Restart NS Controller does not work
linstor reconnect node does not work
lvscan shows device/resource as active

It affects only one node.

WanzenBug commented 2 years ago

Can you check the LINSTOR error reports -> linstor err l. If I had to guess there is probably something about some DRBD command timing out. You could also just create a linstor sos report, which includes the whole cluster state + error reports for us to anaylze:

$ kubectl exec deployment/piraeus-op-cs-controller -- linstor sos-report create
SUCCESS:
    SOS Report created on Controller: /var/log/linstor-controller/sos_2022-03-21_08-27-38.tar.gz
$ kubectl cp <piraeus-cs-controller-pod>:/var/log/linstor-controller/sos_2022-03-21_08-27-38.tar.gz sos_report.tar.gz

tobg commented 2 years ago

Hey Moritz, thank you very much. Please find attached the sos report. sos_report.tar.gz

WanzenBug commented 2 years ago

Seems very strange to me. The issue I could find is that on node digoc.tobg.services LINSTOR does not configure volumes for some drbd resources. I'm note sure how this could happen. Please open an issue with the SOS report upstream https://github.com/linbit/linstor-server

tobg commented 2 years ago

Thanks, did open a new issue at linstor-server

tobg commented 2 years ago

Works now since operator version 1.8.1