rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
881 stars 272 forks source link

Remove deprecated mirrored queue majority check #1734

Closed Zerpet closed 1 month ago

Zerpet commented 1 month ago

https://github.com/rabbitmq/cluster-operator/blob/82415be7ae2e972d8badbf099679b4764a34762c/internal/resource/statefulset.go#L616-L618

The command rabbitmq-upgrade await_online_synchronized_mirror is removed in 4.0. In order to support 4.0, we need to remove this check from the pre-stop hook, otherwise, the hook will always fail, and Kubernetes will ungracefully kill the Pod.

From: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution

If either a PostStart or PreStop hook fails, it kills the Container.

Zerpet commented 1 month ago

@mkuratczyk @MirahImage this issue is a bit awkward, because nothing prevents our users from putting rabbitmq 4.0 in the spec.image. Simply removing the check is possible, but then we take away that safety check from 3.x versions.

Arguably, mirror queues are a deprecated feature, and users should be moving from it. This may be an "incentive" to move out from CMQ.

What do you think?

mkuratczyk commented 1 month ago

To be honest I thought we made this command a no-op in RMQ, rather than removing it. That would make the transition much easier. We can always put it back as a no-op if we want. Alternatively, something like rabbitmq-update await_online_synchronized_mirror || true

jcamu commented 1 month ago

It means for now, we are not able to use the cluster-operator to move from 3.13.7 to 4.X RabbitMQ version ?

mkuratczyk commented 1 month ago

I don't think it's ungracefully - I still see SIGTERM received - shutting down in RabbitMQ logs, so we just don't use the maintenance mode. Not great, but not terrible either.

Zerpet commented 1 month ago

It means for now, we are not able to use the cluster-operator to move from 3.13.7 to 4.X RabbitMQ version ?

Not at all. You will be able to move from 3.13.x to 4.0.x. Everything will continue to work. By removing this check, the Pod will not halt shut down if there are unsynchronised mirror, which will cause some unavailability of CMQ. We will continue to drain the node, alleviating this because the node drain moves queue leaders to other nodes.

Zerpet commented 1 month ago

To be honest I thought we made this command a no-op in RMQ, rather than removing it. That would make the transition much easier. We can always put it back as a no-op if we want. Alternatively, something like rabbitmq-update await_online_synchronized_mirror || true

The more I think about it, the more I lean towards: mirror queues are a deprecated feature, and users should be moving from it. This may be an "incentive" to move out from CMQ

We've been overly generous with CMQ support, given plenty of notices and time to migrate to QQs.

mkuratczyk commented 1 month ago

I've already submitted a PR that fixes this while still maintaining backwards compatibility with CMQs. Given how trivial it is, I don't mind solving the problem this way https://github.com/rabbitmq/cluster-operator/pull/1740