ibm-messaging / mq-helm

Apache License 2.0
28 stars 35 forks source link

nativeha failover -> messages lost #35

Closed lesaux closed 1 year ago

lesaux commented 1 year ago

Hello,

I'm testing out a nativeha setup. To achieve this I am deploying a cluster on GKE with nativeha enabled, but multi-instance disabled.

I then used the supplied test scripts to send a few messages (sendMessage.sh), which indeed sends some message to my active pod, ibm-mq-0.

I then kill the sendMessage.sh script, and in turn kill the ibm-mq-0 pod. ibm-mq-1 becomes active after a few seconds.

If I run the getMessages.sh script, I get no messages at all. it seems the messages weren't replicated.

QMNAME(ibmmq)                                             STATUS(Running) DEFAULT(yes) STANDBY(Not permitted) INSTNAME(Installation1) INSTPATH(/opt/mqm) INSTVER(9.3.1.0) ROLE(Active) INSTANCE(ibm-mq-1) INSYNC(yes) QUORUM(3/3)
 INSTANCE(ibm-mq-1) ROLE(Active) REPLADDR(ibm-mq-replica-1) CONNACTV(yes) INSYNC(yes) BACKLOG(0) CONNINST(yes) ALTDATE(2023-02-16) ALTTIME(22.23.35)
 INSTANCE(ibm-mq-2) ROLE(Replica) REPLADDR(ibm-mq-replica-2) CONNACTV(yes) INSYNC(yes) BACKLOG(0) CONNINST(yes) ALTDATE(2023-02-16) ALTTIME(22.23.35)
 INSTANCE(ibm-mq-0) ROLE(Replica) REPLADDR(ibm-mq-replica-0) CONNACTV(yes) INSYNC(yes) BACKLOG(0) CONNINST(yes) ALTDATE(2023-02-16) ALTTIME(22.23.35)

dspmq command seems to indicate the cluster is running fine. BACKLOG(0) never changes as far as I can see, but my messages got lost. If I fail back to ibm-mq-0 they are still not there.

Am I doing something wrong in my tests?

callumpjackson commented 1 year ago

Hi, sorry for the confusion here. The reason for the behavior you are seeing is that non-persistent messages are sent by the sendMessage.sh - which depends on the /opt/mqm/samp/bin/amqsphac shipped IBM MQ sample.

You can see non-persistence being explicitly set within the sample source code: md.Persistence = MQPER_NOT_PERSISTENT;

Quite understandably you may want to test persistent messages, and if you use the amqsputc sample to put a message you will see the behavior you expect. The reason for not using the amqsputc sample is that it only sends a single message, which does not demonstrate the re connection and speed of fail-over nicely.

callumpjackson commented 1 year ago

Closing - feel free to re-open if you feel the above is not clear.

lesaux commented 1 year ago

Thank you @callumpjackson Using amqsputc instead indeed works!

 sh sendMessage.sh 
Starting amqsputc ibmmq
Sample AMQSPUT0 start
target queue is APPQ
message1
hellohello
message3

Sample AMQSPUT0 end

kubectl get -n ibm-mq pods
NAME       READY   STATUS    RESTARTS   AGE
ibm-mq-0   1/1     Running   0          3m14s
ibm-mq-1   0/1     Running   0          3m14s
ibm-mq-2   0/1     Running   0          3m14s

kubectl delete -n ibm-mq pod ibm-mq-0
pod "ibm-mq-0" deleted

kubectl get -n ibm-mq pods
NAME       READY   STATUS              RESTARTS   AGE
ibm-mq-0   0/1     ContainerCreating   0          2s
ibm-mq-1   0/1     Running             0          3m31s
ibm-mq-2   0/1     Running             0          3m31s

kubectl get -n ibm-mq pods
NAME       READY   STATUS              RESTARTS   AGE
ibm-mq-0   0/1     ContainerCreating   0          5s
ibm-mq-1   1/1     Running             0          3m34s
ibm-mq-2   0/1     Running             0          3m34s

 sh getMessage.sh 
Starting amqsghac secureapphelm
Sample AMQSGHAC start
message <message1>
message <hellohello>
message <message3>