Closed Arn-Arm closed 3 years ago
Hi @Arn-Arm, it's not necessarily the use of quorum queues that's causing pod getting stuck on termination. On pod termination, cluster operator runs three commands in the preStop container lifecycle hooks: rabbitmq-upgrade await_online_quorum_plus_one
, rabbitmq-upgrade await_online_synchronized_mirror
, and rabbitmq-upgrade drain
(see the logic here: https://github.com/rabbitmq/cluster-operator/blob/main/internal/resource/statefulset.go#L679). First two commands ensure that all quorum queues and mirrored queues on this particular rabbit node have quorum (or are synced). You can run these command on your stuck pod to understand which command is preventing termination. If it's indeed the quorum queue command that's causing issues, you can check quorum status for each queue to figure out which queue is the problem by running rabbitmq-queues quorum_status
(https://www.rabbitmq.com/rabbitmq-queues.8.html#quorum_status)
A workaround to disable container preStop checks is to set a short timeout for the preStop container lifecycle hook (see: https://www.rabbitmq.com/kubernetes/operator/using-operator.html#TerminationGracePeriodSeconds). The default timeout is a week long (for data safety), you can configure the timeout to 0 second or a short time to get around it. However, this means that the checks might not be performed successfully at termination and you are at greater risk of data loss.
For your request to disable quorum queue feature flags, cluster operator does not support disabling any feature flag. I will update our documentation to make it clear. Let us know what you find. I think pods stuck at termination can and should be fixed without disabling any feature flag.
Closing because this is not an issue/bug.
Describe the bug
In order to try mitigate issue described here we want disable quorum queues completely which are enabled by default. Documentation describes that it can be done using environment variable(inside
envConfig
and/orenv
)RABBITMQ_FEATURE_FLAGS
orforced_feature_flags_on_init
configuration parameter. After trying both methods i can see that quorum_queue feature is enabled and environment variable and/or configuration are present.To make sure that feature was not disabled i have checked RabbitMQ logs and double checked with the rabbitmqcli tool.
Snippet from the logs
Rabbitmq CLI output
To Reproduce
Steps to reproduce the behavior:
Expected behavior quorum_queues feature flag is disabled.
Version and environment information