ibm-messaging / mq-helm

Apache License 2.0
25 stars 33 forks source link

Running MQ HA on more then 3 replicas #46

Closed schmiuwe closed 1 year ago

schmiuwe commented 1 year ago

Hi Callum,

since we are not able to use PDB I simulated this by extending the cluster nodes to total 5 nodes. After this I scaled up the MQ replicas from 3 to 5. Looking at it it does not seem to work. Can you confirm that the whole MQ HA setup only works with 3 replicas? Or could we drive it with e.g. 5 replicas? If it would work I could set max-surge to 1 to only use 1 node to drain during a cluster upgrade. This would ensure that always 1 active MQ replic would be there.

Currently cluster upgrade with 3 pods and setting max-surge to 1, so total 4 nodes during cluster upgrade only draining 1 node at a time does not work. There will be a pending MQ pod and a container creating, the pending one will stay pending, resulting in only one pod left and this one cannot get active because of the quorum, means I was detecting a downtime of around 2 min or more and several times during a cluster upgrade.

Thank you, Uwe

callumpjackson commented 1 year ago

Hi Uwe - Native HA currently only supports 3 running containers, if you want to scale IBM MQ then you would deploy multiple queue managers. Where each could be a Native HA queue manager. I'm not sure I understand the logic above, but let me explain how a cluster upgrade would normally happen.

Thanks

schmiuwe commented 1 year ago

Hi Callum,

regarding the cluster upgrade:

The Azure cluster upgrade runs like it should be:

What does not work is MQ HA, one pod during this procedure is getting into pending mode. So when the next one node is getting swapped then there is only one pod left and the quorum is not given anymore. Therefore I have observed multiple time that no active MQ pod was available anymore during this upgrade process – what tells me that MQ does not behave properly. The topic now is not about whether another pod takes over anymore, it is more about that cluster upgrade with max-surge set to 1 is also not working with MQ.

The last test would be to drain the nodes manually but this is actually not what we want.

Did you test this case on your end already?

Thank you, Uwe

From: callumpjackson @.> Sent: Friday, June 2, 2023 3:47 PM To: ibm-messaging/mq-helm @.> Cc: Schmiedel Uwe, FG-232 @.>; Author @.> Subject: Re: [ibm-messaging/mq-helm] Running MQ HA on more then 3 replicas (Issue #46)

Sent from outside the BMW organization - be CAUTIOUS, particularly with links and attachments.

Absender außerhalb der BMW Organisation - Bitte VORSICHT beim Öffnen von Links und Anhängen.


Hi Uwe - Native HA currently only supports 3 running containers, if you want to scale IBM MQ then you would deploy multiple queue managers. Where each could be a Native HA queue manager. I'm not sure I understand the logic above, but let me explain how a cluster upgrade would normally happen.

Thanks

— Reply to this email directly, view it on GitHubhttps://github.com/ibm-messaging/mq-helm/issues/46#issuecomment-1573768510, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6UGXELD676DD2U6FWIPKQLXJHVG3ANCNFSM6AAAAAAYYKE4TI. You are receiving this because you authored the thread.Message ID: @.**@.>>

schmiuwe commented 1 year ago

Hi Callum,

I did a manual cluster upgrade today:

In this state it is not possible to continue. Did you really test this scenario already?

I am slowly coming to a point where we cannot use the whole setup since we do not achieve HA and we do not want to perform blue/green deployments. If the setup is not working for cluster upgrades we might consider to stop our whole MQ HA cloud initiative …

Thank you, Uwe

From: Schmiedel Uwe, FG-232 Sent: Monday, June 5, 2023 8:30 AM To: 'ibm-messaging/mq-helm' @.>; ibm-messaging/mq-helm @.> Cc: Author @.***> Subject: RE: [ibm-messaging/mq-helm] Running MQ HA on more then 3 replicas (Issue #46)

Hi Callum,

regarding the cluster upgrade:

The Azure cluster upgrade runs like it should be:

What does not work is MQ HA, one pod during this procedure is getting into pending mode. So when the next one node is getting swapped then there is only one pod left and the quorum is not given anymore. Therefore I have observed multiple time that no active MQ pod was available anymore during this upgrade process – what tells me that MQ does not behave properly. The topic now is not about whether another pod takes over anymore, it is more about that cluster upgrade with max-surge set to 1 is also not working with MQ.

The last test would be to drain the nodes manually but this is actually not what we want.

Did you test this case on your end already?

Thank you, Uwe

From: callumpjackson @.**@.>> Sent: Friday, June 2, 2023 3:47 PM To: ibm-messaging/mq-helm @.**@.>> Cc: Schmiedel Uwe, FG-232 @.**@.>>; Author @.**@.>> Subject: Re: [ibm-messaging/mq-helm] Running MQ HA on more then 3 replicas (Issue #46)

Sent from outside the BMW organization - be CAUTIOUS, particularly with links and attachments.

Absender außerhalb der BMW Organisation - Bitte VORSICHT beim Öffnen von Links und Anhängen.


Hi Uwe - Native HA currently only supports 3 running containers, if you want to scale IBM MQ then you would deploy multiple queue managers. Where each could be a Native HA queue manager. I'm not sure I understand the logic above, but let me explain how a cluster upgrade would normally happen.

Thanks

— Reply to this email directly, view it on GitHubhttps://github.com/ibm-messaging/mq-helm/issues/46#issuecomment-1573768510, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6UGXELD676DD2U6FWIPKQLXJHVG3ANCNFSM6AAAAAAYYKE4TI. You are receiving this because you authored the thread.Message ID: @.**@.>>

arthurbarr commented 1 year ago

I don't think I'm quite clear on the state of the queue manager in your most recent scenario.

However, I can say that having a single node pool across multiple zones has the side effect of creating a race condition, where if the node upgrade happens faster than MQ can regain quorum, then you can kill a second instance of the queue manager and have a failure. PodDisruptionBudget would in theory help here, but we can't currently use PodDisruptionBudget with MQ Native HA, without needing manual intervention during a cluster upgrade.

What I think might work, and match the OpenShift model which we have tested with, is to have a node pool (machine set in OpenShift) for each zone, and thus updating a node pool will cause disruption to a single zone, and there won't be a race condition.

callumpjackson commented 1 year ago

Hi Uwe – thanks for the additional context, the example you provided was helpful. The built-in Azure AKS nodepool upgrade process does appear to be rigid, and we may need to discuss the options available. To effectively do this I’m wondering if a call may assist and will ping you via email. We will update the issue with the conclusions to assist the community.

callumpjackson commented 1 year ago

Closing this issue as we have addressed with a sample and documentation here.