Controller never becomes ready after upgrade to 2.5.2

zrav commented 2 weeks ago

This is a k8s 1.30 test cluster on Ubuntu Jammy, DRBD modules installed via PPA. After upgrade from 2.5.1 to 2.5.2 the controller never becomes ready:

root@k8sw3:~# modinfo drbd
filename:       /lib/modules/5.15.0-119-generic/updates/dkms/drbd.ko
softdep:        post: handshake
alias:          block-major-147-*
license:        GPL
version:        9.2.10
description:    drbd - Distributed Replicated Block Device v9.2.10
author:         Philipp Reisner <phil@linbit.com>, Lars Ellenberg <lars@linbit.com>
srcversion:     BADB6D1C4C401B173352EAB
depends:        lru_cache,libcrc32c
retpoline:      Y
name:           drbd
vermagic:       5.15.0-119-generic SMP mod_unload modversions
parm:           enable_faults:int
parm:           fault_rate:int
parm:           fault_count:int
parm:           fault_devs:int
parm:           disable_sendpage:bool
parm:           allow_oos:DONT USE! (bool)
parm:           minor_count:Approximate number of drbd devices (1U-255U) (uint)
parm:           usermode_helper:string
parm:           protocol_version_min:drbd_protocol_version
parm:           strict_names:restrict resource and connection names to ascii alnum and a subset of punct (drbd_strict_names)

k8s@k8scp2:~$ kubectl get pods -n piraeus-datastore -o wide
NAME                                                  READY   STATUS     RESTARTS        AGE   IP             NODE                NOMINATED NODE   READINESS GATES
ha-controller-n8hm6                                   1/1     Running    7 (33m ago)     92d   10.244.5.116   k8sw3.example.com   <none>           <none>
ha-controller-vn95v                                   1/1     Running    7 (35m ago)     92d   10.244.3.239   k8sw2.example.com   <none>           <none>
ha-controller-zhrhc                                   1/1     Running    7 (33m ago)     92d   10.244.2.216   k8sw1.example.com   <none>           <none>
linstor-controller-5bc9565d7f-gc9ks                   0/1     Running    4 (2m44s ago)   24m   10.244.5.125   k8sw3.example.com   <none>           <none>
linstor-csi-controller-5d9f5bb46f-cqkj9               0/7     Unknown    37              92d   10.244.2.217   k8sw1.example.com   <none>           <none>
linstor-csi-controller-6bd5d4dc9c-l7z5q               0/7     Init:0/1   0               24m   10.244.3.247   k8sw2.example.com   <none>           <none>
linstor-csi-node-2f2lf                                0/3     Init:0/1   0               24m   10.244.3.248   k8sw2.example.com   <none>           <none>
linstor-csi-node-qdt8f                                0/3     Init:0/1   0               24m   10.244.2.220   k8sw1.example.com   <none>           <none>
linstor-csi-node-vrvz7                                0/3     Init:0/1   0               24m   10.244.5.124   k8sw3.example.com   <none>           <none>
linstor-satellite.k8sw1.example.com-cxrzm             2/2     Running    0               24m   10.82.0.23     k8sw1.example.com   <none>           <none>
linstor-satellite.k8sw2.example.com-8kdsh             2/2     Running    0               24m   10.82.0.24     k8sw2.example.com   <none>           <none>
linstor-satellite.k8sw3.example.com-tkkzk             2/2     Running    0               24m   10.82.0.26     k8sw3.example.com   <none>           <none>
piraeus-operator-controller-manager-db9f69747-v49pl   1/1     Running    0               24m   10.244.3.245   k8sw2.example.com   <none>           <none>
piraeus-operator-gencert-7997864766-rqhdl             1/1     Running    0               24m   10.244.3.246   k8sw2.example.com   <none>           <none>

k8s@k8scp2:~/linstor$ kubectl logs -n piraeus-datastore linstor-controller-5bc9565d7f-gc9ks
time="2024-08-25T08:21:16Z" level=info msg="running k8s-await-election" version=refs/tags/v0.4.1
time="2024-08-25T08:21:16Z" level=info msg="no status endpoint specified, will not be created"
I0825 08:21:16.899744       1 leaderelection.go:250] attempting to acquire leader lease piraeus-datastore/linstor-controller...
I0825 08:21:16.919634       1 leaderelection.go:260] successfully acquired lease piraeus-datastore/linstor-controller
time="2024-08-25T08:21:16Z" level=info msg="long live our new leader: 'linstor-controller-5bc9565d7f-gc9ks'!"
time="2024-08-25T08:21:16Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startController]'"
LINSTOR, Module Controller
Version:            1.28.0 (959382f7b4fb9436fefdd21dfa262e90318edaed)
Build time:         2024-07-11T10:21:06+00:00
Java Version:       17
Java VM:            Debian, Version 17.0.11+9-Debian-1deb12u1
Operating system:   Linux, Version 5.15.0-119-generic
Environment:        amd64, 2 processors, 1986 MiB memory reserved for allocations

System components initialization in progress

Loading configuration file "/etc/linstor/linstor.toml"
08:21:17.620 [main] INFO  LINSTOR/Controller - SYSTEM - ErrorReporter DB version 1 found.
08:21:17.621 [main] INFO  LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'
08:21:17.640 [main] INFO  LINSTOR/Controller - SYSTEM - Database type is Kubernetes-CRD
08:21:17.640 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading API classes started.
08:21:18.131 [Main] INFO  LINSTOR/Controller - SYSTEM - API classes loading finished: 491ms
08:21:18.131 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection started.
08:21:18.139 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
08:21:18.140 [Main] INFO  LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
08:21:18.140 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
08:21:18.149 [Main] INFO  LINSTOR/Controller - SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
08:21:18.149 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
08:21:18.150 [Main] INFO  LINSTOR/Controller - SYSTEM - Dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" was successful
08:21:19.707 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection finished: 1576ms
08:21:19.708 [Main] INFO  LINSTOR/Controller - SYSTEM - Cryptography provider: Using default cryptography module
08:21:20.511 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
08:21:21.041 [Main] INFO  LINSTOR/Controller - SYSTEM - SpaceTrackingService: Instance added as a system service
08:21:21.043 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
08:21:21.044 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing the k8s crd database connector
08:21:21.044 [Main] INFO  LINSTOR/Controller - SYSTEM - Kubernetes-CRD connection URL is "k8s"
08:21:22.463 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'K8sCrdDatabaseService' of type K8sCrdDatabaseService
08:21:22.470 [Main] INFO  LINSTOR/Controller - SYSTEM - Security objects load from database is in progress
08:21:23.119 [Main] INFO  LINSTOR/Controller - SYSTEM - Security objects load from database completed
08:21:23.121 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database is in progress
08:21:26.580 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database completed
08:21:26.796 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TaskScheduleService' of type TaskScheduleService
08:21:26.800 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing network communications services
08:21:26.801 [Main] WARN  LINSTOR/Controller - SYSTEM - The SSL network communication service 'DebugSslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
08:21:26.806 [Main] INFO  LINSTOR/Controller - SYSTEM - Created network communication service 'PlainConnector'
08:21:26.807 [Main] WARN  LINSTOR/Controller - SYSTEM - The SSL network communication service 'SslConnector' could not be started because the keyStore file (/etc/linstor/ssl/keystore.jks) is missing
08:21:26.807 [Main] INFO  LINSTOR/Controller - SYSTEM - Created network communication service 'SslConnector'
08:21:26.807 [Main] INFO  LINSTOR/Controller - SYSTEM - Reconnecting to previously known nodes
08:21:26.808 [Main] INFO  LINSTOR/Controller - SYSTEM - Establishing connection to node 'k8sw1.example.com' via /10.82.0.23:3366 ...
08:21:26.818 [Main] INFO  LINSTOR/Controller - SYSTEM - Establishing connection to node 'k8sw2.example.com' via /10.82.0.24:3366 ...
08:21:26.819 [Main] INFO  LINSTOR/Controller - SYSTEM - Establishing connection to node 'k8sw3.example.com' via /10.82.0.26:3366 ...
08:21:26.827 [Main] INFO  LINSTOR/Controller - SYSTEM - Reconnect requests sent
08:21:26.828 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'SpaceTrackingService' of type SpaceTrackingService
08:21:26.836 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'ScheduleBackupService' of type ScheduleBackupService
08:21:26.837 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'EbsStatusPoll' of type EbsStatusPoll
08:21:26.998 [SpaceTrackingService] ERROR LINSTOR/Controller - SYSTEM - Uncaught exception in j [Report number 66CAE97D-00000-000000]

Aug 25, 2024 8:21:27 AM org.glassfish.jersey.server.wadl.WadlFeature configure
WARNING: JAX-B API not found . WADL feature is disabled.
Aug 25, 2024 8:21:27 AM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [[::]:3370]
Aug 25, 2024 8:21:27 AM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
08:21:27.792 [Main] WARN  LINSTOR/Controller - SYSTEM - Not calling 'systemd-notify' as NOTIFY_SOCKET is null
08:21:27.793 [Main] INFO  LINSTOR/Controller - SYSTEM - Controller initialized

08:21:27.794 [SslConnector] INFO  LINSTOR/Controller - SYSTEM - Sending authentication to satellite 'k8sw1.example.com'
08:21:27.826 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - Establishing connection to node 'k8sw3.example.com' via /10.82.0.26:3366 ...
08:21:27.827 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - Establishing connection to node 'k8sw2.example.com' via /10.82.0.24:3366 ...
08:21:27.830 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
08:21:27.831 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
08:21:27.832 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - BalanceResourcesTask/START
08:21:27.840 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - BalanceResourcesTask/END: Adjusted: 0 - Removed: 0
08:21:27.841 [SslConnector] INFO  LINSTOR/Controller - SYSTEM - Sending authentication to satellite 'k8sw2.example.com'
08:21:27.849 [SslConnector] INFO  LINSTOR/Controller - SYSTEM - Sending authentication to satellite 'k8sw3.example.com'
08:21:28.506 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Changing connection state of node 'k8sw1.example.com' from OFFLINE -> CONNECTED
08:21:28.509 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Satellite 'k8sw1.example.com' authenticated
08:21:28.511 [MainWorkerPool-2] INFO  LINSTOR/Controller - SYSTEM - Changing connection state of node 'k8sw2.example.com' from OFFLINE -> CONNECTED
08:21:28.514 [MainWorkerPool-2] INFO  LINSTOR/Controller - SYSTEM - Satellite 'k8sw2.example.com' authenticated
08:21:28.515 [MainWorkerPool-2] INFO  LINSTOR/Controller - SYSTEM - Sending full sync to Node: 'k8sw2.example.com'.
08:21:28.519 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Sending full sync to Node: 'k8sw1.example.com'.
08:21:28.915 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Changing connection state of node 'k8sw3.example.com' from OFFLINE -> CONNECTED
08:21:28.931 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Satellite 'k8sw3.example.com' authenticated
08:21:28.932 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Sending full sync to Node: 'k8sw3.example.com'.
08:21:29.288 [MainWorkerPool-2] INFO  LINSTOR/Controller - SYSTEM - Changing connection state of node 'k8sw3.example.com' from CONNECTED -> ONLINE
08:21:29.472 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Changing connection state of node 'k8sw1.example.com' from CONNECTED -> ONLINE
08:21:29.487 [MainWorkerPool-2] INFO  LINSTOR/Controller - SYSTEM - Changing connection state of node 'k8sw2.example.com' from CONNECTED -> ONLINE

k8s@k8scp2:~$ kubectl describe -n piraeus-datastore pod linstor-controller-5bc9565d7f-gc9ks
Name:                 linstor-controller-5bc9565d7f-gc9ks
Namespace:            piraeus-datastore
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      linstor-controller
Node:                 k8sw3.example.com/10.82.0.26
Start Time:           Sun, 25 Aug 2024 07:59:36 +0000
Labels:               app.kubernetes.io/component=linstor-controller
                      app.kubernetes.io/instance=linstorcluster
                      app.kubernetes.io/name=piraeus-datastore
                      pod-template-hash=5bc9565d7f
Annotations:          kubectl.kubernetes.io/default-container: linstor-controller
Status:               Running
IP:                   10.244.5.125
IPs:
  IP:           10.244.5.125
Controlled By:  ReplicaSet/linstor-controller-5bc9565d7f
Init Containers:
  run-migration:
    Container ID:  containerd://079a193ebd297dacb65d44f98e71ef28ff406e5395569139577ddaf335813bb1
    Image:         quay.io/piraeusdatastore/piraeus-server:v1.28.0
    Image ID:      quay.io/piraeusdatastore/piraeus-server@sha256:cc21d7daf44f7f328f9175ff443a0d2289efa3038182d8892ce753d9e67178a3
    Port:          <none>
    Host Port:     <none>
    Args:
      runMigration
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 25 Aug 2024 08:00:37 +0000
      Finished:     Sun, 25 Aug 2024 08:01:22 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      K8S_AWAIT_ELECTION_ENABLED:         1
      K8S_AWAIT_ELECTION_NAME:            linstor-controller
      K8S_AWAIT_ELECTION_LOCK_NAME:       linstor-controller
      K8S_AWAIT_ELECTION_LOCK_NAMESPACE:  piraeus-datastore (v1:metadata.namespace)
      K8S_AWAIT_ELECTION_IDENTITY:        linstor-controller-5bc9565d7f-gc9ks (v1:metadata.name)
    Mounts:
      /etc/linstor from etc-linstor (ro)
      /run/migration from migration (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pkqjc (ro)
Containers:
  linstor-controller:
    Container ID:  containerd://a0e046313a75f00c56a29de1ae58f980526705be3a411c6d4d1fbf5e12773848
    Image:         quay.io/piraeusdatastore/piraeus-server:v1.28.0
    Image ID:      quay.io/piraeusdatastore/piraeus-server@sha256:cc21d7daf44f7f328f9175ff443a0d2289efa3038182d8892ce753d9e67178a3
    Port:          3370/TCP
    Host Port:     0/TCP
    Args:
      startController
    State:          Running
      Started:      Sun, 25 Aug 2024 08:21:16 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Sun, 25 Aug 2024 08:16:16 +0000
      Finished:     Sun, 25 Aug 2024 08:21:16 +0000
    Ready:          False
    Restart Count:  4
    Liveness:       http-get http://:api/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:        http-get http://:api/health delay=0s timeout=1s period=10s #success=1 #failure=30
    Environment:
      JAVA_OPTS:                          -Djdk.tls.acknowledgeCloseNotify=true
      K8S_AWAIT_ELECTION_ENABLED:         1
      K8S_AWAIT_ELECTION_NAME:            linstor-controller
      K8S_AWAIT_ELECTION_LOCK_NAME:       linstor-controller
      K8S_AWAIT_ELECTION_LOCK_NAMESPACE:  piraeus-datastore (v1:metadata.namespace)
      K8S_AWAIT_ELECTION_IDENTITY:        linstor-controller-5bc9565d7f-gc9ks (v1:metadata.name)
    Mounts:
      /etc/linstor from etc-linstor (ro)
      /tmp from tmp (rw)
      /var/log/linstor-controller from var-log-linstor-controller (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pkqjc (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  etc-linstor:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      linstor-controller-config
    Optional:  false
  var-log-linstor-controller:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  migration:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-pkqjc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              worker=true
Tolerations:                 drbd.linbit.com/force-io-error:NoSchedule
                             drbd.linbit.com/lost-quorum:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 30s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 30s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  25m                  default-scheduler  Successfully assigned piraeus-datastore/linstor-controller-5bc9565d7f-gc9ks to k8sw3.example.com
  Normal   Pulling    25m                  kubelet            Pulling image "quay.io/piraeusdatastore/piraeus-server:v1.28.0"
  Normal   Pulled     24m                  kubelet            Successfully pulled image "quay.io/piraeusdatastore/piraeus-server:v1.28.0" in 52.078s (1m0.455s including waiting). Image size: 323505259 bytes.
  Normal   Created    24m                  kubelet            Created container run-migration
  Normal   Started    24m                  kubelet            Started container run-migration
  Normal   Pulled     23m                  kubelet            Container image "quay.io/piraeusdatastore/piraeus-server:v1.28.0" already present on machine
  Normal   Created    23m                  kubelet            Created container linstor-controller
  Normal   Started    23m                  kubelet            Started container linstor-controller
  Warning  Unhealthy  23m                  kubelet            Startup probe failed: Get "http://10.244.5.125:3370/health": dial tcp 10.244.5.125:3370: connect: connection refused
  Warning  Unhealthy  33s (x139 over 23m)  kubelet            Startup probe failed: HTTP probe failed with statuscode: 500

Happy to provide more info if told where to look.

WanzenBug commented 2 weeks ago

08:21:26.998 [SpaceTrackingService] ERROR LINSTOR/Controller - SYSTEM - Uncaught exception in j [Report number 66CAE97D-00000-000000]

For some reason this service does not want to start. This could be related to some corrupted DB entries. You can try to clean the space tracking database:

Create a backup of the entries: kubectl get -oyaml spacehistory.internal.linstor.linbit.com > spacehistory.yaml. Attach it to this issue so we can debug the issue
Delete the entries: kubectl delete spacehistory.internal.linstor.linbit.com --all and see if the controller starts.

zrav commented 2 weeks ago

Deleting the space history did indeed make the controller become ready again. File attached. spacehistory.txt

However, now the ha-controller pods are crashing:

k8s@k8scp1:~$ kubectl logs -n piraeus-datastore ha-controller-n8hm6
I0826 09:30:33.143518       1 agent.go:201] version: v1.2.1
I0826 09:30:33.143594       1 agent.go:202] node: k8sw3.example.com
I0826 09:30:33.143699       1 agent.go:228] waiting for caches to sync
I0826 09:30:33.244814       1 agent.go:230] caches synced
I0826 09:30:33.244849       1 agent.go:253] starting reconciliation
E0826 09:30:33.248622       1 run.go:74] "command failed" err="failed to execute drbdsetup status --json: exit status 20"

Running the command directly on the node produces an expected output:

root@k8sw3:~# drbdsetup status --json
[
{
  "name": "pvc-6064ae82-e444-49ac-aaea-ea8eaccab384",
  "node-id": 2,
  "role": "Primary",
  "suspended": false,
  "suspended-user": false,
  "suspended-no-data": false,
  "suspended-fencing": false,
  "suspended-quorum": false,
  "force-io-failures": false,
  "write-ordering": "none",
  "devices": [
    {
      "volume": 0,
      "minor": 1002,
      "disk-state": "Diskless",
      "client": true,
      "quorum": true,
      "size": 65536,
      "read": 0,
      "written": 0,
      "al-writes": 0,
      "bm-writes": 0,
      "upper-pending": 0,
      "lower-pending": 0
    } ],
  "connections": [
    {
      "peer-node-id": 0,
      "name": "k8sw1.example.com",
      "connection-state": "Connected",
      "congested": false,
      "peer-role": "Secondary",
      "tls": false,
      "ap-in-flight": 0,
      "rs-in-flight": 0,
      "paths": [
        {
          "this_host": {
            "address": "10.82.0.26",
            "port": 7002,
            "family": "ipv4"
          },
          "remote_host": {
            "address": "10.82.0.23",
            "port": 7002,
            "family": "ipv4"
          },
          "established": true
        } ],
      "peer_devices": [
        {
          "volume": 0,
          "replication-state": "Established",
          "peer-disk-state": "UpToDate",
          "peer-client": false,
          "resync-suspended": "no",
          "received": 492,
          "sent": 13,
          "out-of-sync": 0,
          "pending": 0,
          "unacked": 0,
          "has-sync-details": false,
          "has-online-verify-details": false,
          "percent-in-sync": 100.00
        } ]
    },
    {
      "peer-node-id": 1,
      "name": "k8sw2.example.com",
      "connection-state": "Connected",
      "congested": false,
      "peer-role": "Secondary",
      "tls": false,
      "ap-in-flight": 0,
      "rs-in-flight": 0,
      "paths": [
        {
          "this_host": {
            "address": "10.82.0.26",
            "port": 7002,
            "family": "ipv4"
          },
          "remote_host": {
            "address": "10.82.0.24",
            "port": 7002,
            "family": "ipv4"
          },
          "established": true
        } ],
      "peer_devices": [
        {
          "volume": 0,
          "replication-state": "Established",
          "peer-disk-state": "UpToDate",
          "peer-client": false,
          "resync-suspended": "no",
          "received": 0,
          "sent": 0,
          "out-of-sync": 0,
          "pending": 0,
          "unacked": 0,
          "has-sync-details": false,
          "has-online-verify-details": false,
          "percent-in-sync": 100.00
        } ]
    } ]
}
]

dimm0 commented 4 days ago

Same issue here, deleting spacehistory resolved it. I can send my db if needed.

piraeusdatastore / piraeus-operator

Controller never becomes ready after upgrade to 2.5.2 #698