DRBD Connection Failure in Kubernetes Environment with External LINSTOR Controller

jludwig commented 1 month ago

Environment

DRBD Version: 9.2.9
K3S Worker OS: Debian GNU/Linux 11 (bullseye)
Kernel: 5.10.0-30-cloud-amd64
LINSTOR Version: 1.27.1
Kubernetes: k3s version v1.29.6+k3s1 (83ae095a)
Piraeus Operator: v2 (latest, compiled via kustomize)
- linstor-controller: v1.25.0
- linstor-satellite: v1.25.0
- linstor-csi: v1.2.3
- drbd-reactor: v1.3.0
- ha-controller: v1.1.4
- drbd-shutdown-guard: v1.0.0
- drbd-module-loader: v9.2.5
Proxmox: pve-manager/8.2.4/faa83925c9641325
Proxmox Kernel: Linux 6.8.8-2-pve (2024-06-24T09:00Z)

Setup Description

Three Proxmox nodes are successfully connected using DRBD/LINSTOR
Attempting to create diskless nodes on Kubernetes (k3s) using piraeus-operator
External LINSTOR controller is set to one of the Proxmox nodes
Kubernetes workers are running as VMs on the Proxmox nodes

Description

When creating a resource using a PVC that uses the configured storageclass, the DRBD connection between the Kubernetes worker node and the Proxmox node (acting as the external LINSTOR controller) fails to establish. The connection attempts are continuous but unsuccessful.

Observed Behavior

DRBD Connection Failures:

drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: sock_recvmsg returned -11
drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: conn( Connecting -> BrokenPipe )
drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: Connection closed
drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: conn( BrokenPipe -> Unconnected ) [disconnected]

DRBD Status:

Kubernetes Worker Node:

   $ drbdadm status
   pvc-1c25b309-32b6-401a-8020-c6f093a1a966 role:Secondary
     disk:Diskless
     pve-epyc-01 connection:Unconnected

Proxmox Node:

   $ drbdadm status
   pm-c05fc392 role:Secondary
     disk:UpToDate
     pve-epyc-02 role:Secondary
       peer-disk:UpToDate
     pve-epyc-03 role:Secondary
       peer-disk:UpToDate
   pvc-1c25b309-32b6-401a-8020-c6f093a1a966 role:Secondary
     disk:UpToDate
     k3s-int-stage-work03 connection:Connecting

Note: On both nodes, the connections cycle between Unconnected and Connecting states.

Additional data:

   pvc-1c25b309-32b6-401a-8020-c6f093a1a966 node-id:1 role:Secondary suspended:no
     volume:0 minor:1000 disk:Diskless client:yes backing_dev:none quorum:yes
   pve-epyc-01 node-id:0 connection:Connecting role:Unknown tls:no congested:no
     volume:0 replication:Off peer-disk:DUnknown resync-suspended:no

Port 7000 Behavior:
- On the Kubernetes worker node, port 7000 is intermittently available
- The port is open for approximately 500ms, then closed for about 10 seconds
- This behavior is observed both when checking listening ports on the worker node and when attempting to connect from the Proxmox node

Configuration

StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-test
provisioner: linstor.csi.linbit.com
parameters:
linstor.csi.linbit.com/storagePool: linstor_pool
linstor.csi.linbit.com/resourceGroup: "linstor-test"
csi.storage.k8s.io/fstype: xfs

LinstorCluster:

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
name: linstorcluster
spec:
externalController:
url: http://linstor.jludwig.win:3370
controller:
enabled: false

LinstorSatelliteConfiguration:

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
name: host-network
spec:
podTemplate:
spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet
---
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
name: custom-drbd-module-loader-image
spec:
podTemplate:
spec:
  initContainers:
    - name: drbd-module-loader
      image: quay.io/piraeusdatastore/drbd9-bullseye

DRBD Resource Configuration (k3s-int-stage-work03):

resource "pvc-1c25b309-32b6-401a-8020-c6f093a1a966"
{
options
{
    on-no-data-accessible suspend-io;
    on-suspended-primary-outdated force-secondary;
    quorum off;
}

net
{
    cram-hmac-alg     sha1;
    shared-secret     "<redacted>";
    rr-conflict retry-connect;
    verify-alg "crct10dif";
}

on "k3s-int-stage-work03"
{
    volume 0
    {
        disk        none;
        disk
        {
            discard-zeroes-if-aligned yes;
            rs-discard-granularity 16384;
        }
        meta-disk   internal;
        device      minor 1000;
    }
    node-id    1;
}

on "pve-epyc-01"
{
    volume 0
    {
        disk        /dev/drbd/this/is/not/used;
        disk
        {
            discard-zeroes-if-aligned yes;
            rs-discard-granularity 16384;
        }
        meta-disk   internal;
        device      minor 1000;
    }
    node-id    0;
}

connection
{
    host "k3s-int-stage-work03" address ipv4 192.168.42.158:7000;
    host "pve-epyc-01" address ipv4 192.168.42.25:7000;
}
}

DRBD Resource Configuration (pve-epyc-01):

resource "pvc-1c25b309-32b6-401a-8020-c6f093a1a966"
{
options
{
    on-no-data-accessible suspend-io;
    on-suspended-primary-outdated force-secondary;
    quorum off;
}

net
{
    cram-hmac-alg     sha1;
    shared-secret     "<redacted>";
    rr-conflict retry-connect;
    verify-alg "crct10dif";
}

on "pve-epyc-01"
{
    volume 0
    {
        disk        /dev/zvol/e01-nvme-ssd/pvc-1c25b309-32b6-401a-8020-c6f093a1a966_00000;
        disk
        {
            discard-zeroes-if-aligned yes;
            rs-discard-granularity 16384;
        }
        meta-disk   internal;
        device      minor 1000;
    }
    node-id    0;
}

on "k3s-int-stage-work03"
{
    volume 0
    {
        disk        none;
        disk
        {
            discard-zeroes-if-aligned yes;
            rs-discard-granularity 16384;
        }
        meta-disk   internal;
        device      minor 1000;
    }
    node-id    1;
}

connection
{
    host "pve-epyc-01" address ipv4 192.168.42.25:7000;
    host "k3s-int-stage-work03" address ipv4 192.168.42.158:7000;
}
}

Additional Information

The Proxmox nodes are successfully connected using the same DRBD/LINSTOR versions
Network configuration appears correct, with expected IP addresses
No apparent firewall issues observed
The DRBD configuration shows a diskless setup on the Kubernetes node (k3s-int-stage-work03) and a disk-based setup on the Proxmox node (pve-epyc-01)

Troubleshooting Steps Taken

Verified DRBD configuration on both Kubernetes and Proxmox nodes
Checked network connectivity between Kubernetes and Proxmox nodes
Monitored port 7000 behavior on the Kubernetes worker node
Attempted to connect to port 7000 from Proxmox node to Kubernetes worker node

Questions

Could the port's brief open period (500ms) be related to how the piraeus-operator is managing DRBD connections?
Are there known issues or specific configuration requirements when using an external LINSTOR controller with piraeus-operator v2 in this setup?
What additional logging or diagnostics can be enabled to provide more insight into why the DRBD connection is failing to establish?
Are there any recommended troubleshooting steps specific to this k3s with external LINSTOR controller setup?

jludwig commented 1 month ago

SOS Report: sos_2024-07-09_23-37-58.tar.gz

WanzenBug commented 1 month ago

Have you checked for firewall settings on both nodes? Looks like the k8s node is struggling to receive anything from the proxmox node.

jludwig commented 1 month ago

jludwig@k3s-int-stage-work03:~$ sudo iptables -L -v -n | grep -E "7000|3366|3376"
# Warning: iptables-legacy tables present, use iptables-legacy to see them
  745 42076 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:3366
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:3376
  268 14800 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 7000:7010
  547 51601 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp spt:3366
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp spt:3376
  214 11560 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport sports 7000:7010

For Proxmox, I actually don't use iptables at all and just rely on OPNsense to do it's job.

On OPNsense, I can look at my live firewall rules, and I see the packets being allowed back and fourth to each node.

Are there any other rules I should look for/add?

Note, after updating the log level to "TRACE", I see this:

2024_07_10 16:11:20.777 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:Connecting 2024_07_10 16:11:21.287 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change path name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 local:ipv4:192.168.42.158:7000 peer:ipv4:192.168.42.25:7000 established:yes 2024_07_10 16:11:23.315 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:BrokenPipe 2024_07_10 16:11:23.348 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change path name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 local:ipv4:192.168.42.158:7000 peer:ipv4:192.168.42.25:7000 established:no 2024_07_10 16:11:23.353 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:Unconnected 2024_07_10 16:11:24.363 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:Connecting

Edit: I should mention that seeing the packets in OPNsense only happened when I was connecting across vlans.

I did iperf3 tests from both nodes when on the same vlan on port 7000 and I'm getting >10 gbits per second and jitter below .02ms. No lost packets.

I think this may be a drbd issue and I should probably post this there instead.

WanzenBug commented 1 month ago

I did iperf3 tests from both nodes when on the same vlan on port 7000 and I'm getting >10 gbits per second and jitter below .02ms. No lost packets.

Just to make extra sure, did you test it in both directions? I.e. running the iperf3 "server" once on a k8s host and once on a proxmox host?

What is interesting is that from the logs it looks like the proxmox nodes does not see any connection attempts at all, while the k8s worker seems to "talk" to something which does not look like DRBD, hence it timing out eventually.

Other than that, yeah, might be better to open an issue on DRBD. You may want to upgrade to DRBD 9.2.10 on the k8s nodes first just to make sure, even if there does not seem to be any relevant change in the latest release.

piraeusdatastore / piraeus-operator