rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
867 stars 271 forks source link

Pods stuck on terminate status #1519

Open Gsantomaggio opened 9 months ago

Gsantomaggio commented 9 months ago

Describe the bug

Two rabbitmq PODs are stuck on terminate status when during a namespace deletion.

To Reproduce

Steps to reproduce the behavior:

  1. Create a stream queue
  2. pump some message
  3. remove the leader replica with rabbitmq-streams delete_replica my_stream my_leader_node
  4. restart the leader node
  5. it should happen the bug ( it is not systematic )
apiVersion: v1
kind: Namespace
metadata:
  name: stream-clients-test
---
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: tls
  namespace: stream-clients-test
spec:
  replicas: 3
  image: rabbitmq:3.13-rc-management
  service:
    type: LoadBalancer
  tls:
    secretName: tls-secret
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 800m
      memory: 1Gi
  rabbitmq:
    additionalPlugins:
      - rabbitmq_stream
      - rabbitmq_stream_management

Expected behavior Should stop all the pods

Screenshots

Version and environment information

Additional context

Per conversation with @mkuratczyk the problem is:

queue 'BenchmarkDotNet0' in vhost '/' will become unavailable if node rabbit@tls-server-0.tls-nodes.stream-clients-test stops

rabbit_stream_coordinator will become unavailable if node rabbit@tls-server-0.tls-nodes.stream-clients-test stops
github-actions[bot] commented 7 months ago

This issue has been marked as stale due to 60 days of inactivity. Stale issues will be closed after a further 30 days of inactivity; please remove the stale label in order to prevent this occurring.