piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
379 stars 60 forks source link

DRBD Connection Failure in Kubernetes Environment with External LINSTOR Controller #683

Open jludwig opened 1 month ago

jludwig commented 1 month ago

Environment

Setup Description

Description

When creating a resource using a PVC that uses the configured storageclass, the DRBD connection between the Kubernetes worker node and the Proxmox node (acting as the external LINSTOR controller) fails to establish. The connection attempts are continuous but unsuccessful.

Observed Behavior

  1. DRBD Connection Failures:

    drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: sock_recvmsg returned -11
    drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: conn( Connecting -> BrokenPipe )
    drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: Connection closed
    drbd pvc-1c25b309-32b6-401a-8020-c6f093a1a966 pve-epyc-01: conn( BrokenPipe -> Unconnected ) [disconnected]
  2. DRBD Status:

Kubernetes Worker Node:

   $ drbdadm status
   pvc-1c25b309-32b6-401a-8020-c6f093a1a966 role:Secondary
     disk:Diskless
     pve-epyc-01 connection:Unconnected

Proxmox Node:

   $ drbdadm status
   pm-c05fc392 role:Secondary
     disk:UpToDate
     pve-epyc-02 role:Secondary
       peer-disk:UpToDate
     pve-epyc-03 role:Secondary
       peer-disk:UpToDate
   pvc-1c25b309-32b6-401a-8020-c6f093a1a966 role:Secondary
     disk:UpToDate
     k3s-int-stage-work03 connection:Connecting

Note: On both nodes, the connections cycle between Unconnected and Connecting states.

Additional data:

   pvc-1c25b309-32b6-401a-8020-c6f093a1a966 node-id:1 role:Secondary suspended:no
     volume:0 minor:1000 disk:Diskless client:yes backing_dev:none quorum:yes
   pve-epyc-01 node-id:0 connection:Connecting role:Unknown tls:no congested:no
     volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
  1. Port 7000 Behavior:
    • On the Kubernetes worker node, port 7000 is intermittently available
    • The port is open for approximately 500ms, then closed for about 10 seconds
    • This behavior is observed both when checking listening ports on the worker node and when attempting to connect from the Proxmox node

Configuration

  1. StorageClass:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
    name: linstor-test
    provisioner: linstor.csi.linbit.com
    parameters:
    linstor.csi.linbit.com/storagePool: linstor_pool
    linstor.csi.linbit.com/resourceGroup: "linstor-test"
    csi.storage.k8s.io/fstype: xfs
  2. LinstorCluster:

    apiVersion: piraeus.io/v1
    kind: LinstorCluster
    metadata:
    name: linstorcluster
    spec:
    externalController:
    url: http://linstor.jludwig.win:3370
    controller:
    enabled: false
  3. LinstorSatelliteConfiguration:

    apiVersion: piraeus.io/v1
    kind: LinstorSatelliteConfiguration
    metadata:
    name: host-network
    spec:
    podTemplate:
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
    ---
    apiVersion: piraeus.io/v1
    kind: LinstorSatelliteConfiguration
    metadata:
    name: custom-drbd-module-loader-image
    spec:
    podTemplate:
    spec:
      initContainers:
        - name: drbd-module-loader
          image: quay.io/piraeusdatastore/drbd9-bullseye
  4. DRBD Resource Configuration (k3s-int-stage-work03):

    resource "pvc-1c25b309-32b6-401a-8020-c6f093a1a966"
    {
    options
    {
        on-no-data-accessible suspend-io;
        on-suspended-primary-outdated force-secondary;
        quorum off;
    }
    
    net
    {
        cram-hmac-alg     sha1;
        shared-secret     "<redacted>";
        rr-conflict retry-connect;
        verify-alg "crct10dif";
    }
    
    on "k3s-int-stage-work03"
    {
        volume 0
        {
            disk        none;
            disk
            {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 16384;
            }
            meta-disk   internal;
            device      minor 1000;
        }
        node-id    1;
    }
    
    on "pve-epyc-01"
    {
        volume 0
        {
            disk        /dev/drbd/this/is/not/used;
            disk
            {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 16384;
            }
            meta-disk   internal;
            device      minor 1000;
        }
        node-id    0;
    }
    
    connection
    {
        host "k3s-int-stage-work03" address ipv4 192.168.42.158:7000;
        host "pve-epyc-01" address ipv4 192.168.42.25:7000;
    }
    }
  5. DRBD Resource Configuration (pve-epyc-01):

    resource "pvc-1c25b309-32b6-401a-8020-c6f093a1a966"
    {
    options
    {
        on-no-data-accessible suspend-io;
        on-suspended-primary-outdated force-secondary;
        quorum off;
    }
    
    net
    {
        cram-hmac-alg     sha1;
        shared-secret     "<redacted>";
        rr-conflict retry-connect;
        verify-alg "crct10dif";
    }
    
    on "pve-epyc-01"
    {
        volume 0
        {
            disk        /dev/zvol/e01-nvme-ssd/pvc-1c25b309-32b6-401a-8020-c6f093a1a966_00000;
            disk
            {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 16384;
            }
            meta-disk   internal;
            device      minor 1000;
        }
        node-id    0;
    }
    
    on "k3s-int-stage-work03"
    {
        volume 0
        {
            disk        none;
            disk
            {
                discard-zeroes-if-aligned yes;
                rs-discard-granularity 16384;
            }
            meta-disk   internal;
            device      minor 1000;
        }
        node-id    1;
    }
    
    connection
    {
        host "pve-epyc-01" address ipv4 192.168.42.25:7000;
        host "k3s-int-stage-work03" address ipv4 192.168.42.158:7000;
    }
    }

Additional Information

Troubleshooting Steps Taken

Questions

  1. Could the port's brief open period (500ms) be related to how the piraeus-operator is managing DRBD connections?
  2. Are there known issues or specific configuration requirements when using an external LINSTOR controller with piraeus-operator v2 in this setup?
  3. What additional logging or diagnostics can be enabled to provide more insight into why the DRBD connection is failing to establish?
  4. Are there any recommended troubleshooting steps specific to this k3s with external LINSTOR controller setup?
jludwig commented 1 month ago

SOS Report: sos_2024-07-09_23-37-58.tar.gz

WanzenBug commented 1 month ago

Have you checked for firewall settings on both nodes? Looks like the k8s node is struggling to receive anything from the proxmox node.

jludwig commented 1 month ago
jludwig@k3s-int-stage-work03:~$ sudo iptables -L -v -n | grep -E "7000|3366|3376"
# Warning: iptables-legacy tables present, use iptables-legacy to see them
  745 42076 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:3366
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:3376
  268 14800 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 7000:7010
  547 51601 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp spt:3366
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp spt:3376
  214 11560 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport sports 7000:7010

For Proxmox, I actually don't use iptables at all and just rely on OPNsense to do it's job.

On OPNsense, I can look at my live firewall rules, and I see the packets being allowed back and fourth to each node.

Are there any other rules I should look for/add?

Note, after updating the log level to "TRACE", I see this:

2024_07_10 16:11:20.777 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:Connecting 2024_07_10 16:11:21.287 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change path name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 local:ipv4:192.168.42.158:7000 peer:ipv4:192.168.42.25:7000 established:yes 2024_07_10 16:11:23.315 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:BrokenPipe 2024_07_10 16:11:23.348 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change path name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 local:ipv4:192.168.42.158:7000 peer:ipv4:192.168.42.25:7000 established:no 2024_07_10 16:11:23.353 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:Unconnected 2024_07_10 16:11:24.363 [DrbdEventService] TRACE LINSTOR/Satellite - SYSTEM - DRBD 'events2': change connection name:pvc-ac80db5a-c34b-4fe1-b8ec-333dbbf13989 peer-node-id:0 conn-name:pve-epyc-01 connection:Connecting

Edit: I should mention that seeing the packets in OPNsense only happened when I was connecting across vlans.

I did iperf3 tests from both nodes when on the same vlan on port 7000 and I'm getting >10 gbits per second and jitter below .02ms. No lost packets.

I think this may be a drbd issue and I should probably post this there instead.

WanzenBug commented 1 month ago

I did iperf3 tests from both nodes when on the same vlan on port 7000 and I'm getting >10 gbits per second and jitter below .02ms. No lost packets.

Just to make extra sure, did you test it in both directions? I.e. running the iperf3 "server" once on a k8s host and once on a proxmox host?

What is interesting is that from the logs it looks like the proxmox nodes does not see any connection attempts at all, while the k8s worker seems to "talk" to something which does not look like DRBD, hence it timing out eventually.

Other than that, yeah, might be better to open an issue on DRBD. You may want to upgrade to DRBD 9.2.10 on the k8s nodes first just to make sure, even if there does not seem to be any relevant change in the latest release.