ibm-messaging / mq-helm

Apache License 2.0
28 stars 34 forks source link

AMQ3230E rrcE_SSL_BAD_KEYFILE_LABEL (407) on AKS #37

Closed JBarbes closed 1 year ago

JBarbes commented 1 year ago

Hi - I'm setting up a nativeHA queue manager on AKS. The pods fails during startup due to the following error. I can't figure out what goes wrong.

I'm using chart v4.0.0 and MQ v9.3.1.1 from the developer image.

AMQ3230E: Native HA network connection attempt from network address '172.20.88.197' failed. [CommentInsert1(172.20.88.197), CommentInsert2(secureapphelm), CommentInsert3(rrcE_SSL_BAD_KEYFILE_LABEL (407) (????) (ha) ())]

Any help appreciated Rgds John Barbesgaard

callumpjackson commented 1 year ago

The error would suggest the key store being used by the pods for the NativeHA replication has been incorrectly created. This should be created by the container on startup based on kubernetes secrets referenced in the helm configuration.

Can you provide the helm options that you are using?

JBarbes commented 1 year ago

Hi Callum

At this point I still use defaults and overrides from the sample, except that I point to a different image file.

secureapp_nativeha.yaml.txt values.yaml.txt

callumpjackson commented 1 year ago

Hi - I've tried to recreate but everything appears to work for me (at the same levels). I've documented the steps here to see if you can see any difference:

$ ./createAKSCluster.sh MQTest MQCluster westus
<VARIOUS STANDARD OUTPUT FROM AZURE>

$ ./install.sh
Context "MQCluster" modified.
configmap/helmsecure created
secret/helmsecure created
NAME: secureapphelm
LAST DEPLOYED: Tue Feb 28 10:41:55 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Get the MQ Console URL by running these commands:
  export CONSOLE_PORT=$(kubectl get services secureapphelm-ibm-mq-web -n default -o jsonpath="{.spec.ports[?(@.port=="9443")].nodePort}")
  export CONSOLE_IP=$(kubectl get nodes -o jsonpath='{..addresses[1].address}' | awk '{print $1}')
  echo https://$CONSOLE_IP:$CONSOLE_PORT/ibmmq/console

Get the load balancer exposed MQ Console URL by running these commands:
  export CONSOLE_PORT=9443
  export CONSOLE_IP=$(kubectl get services secureapphelm-ibm-mq-loadbalancer -n default -o jsonpath="{..hostname}")$(kubectl get services secureapphelm-ibm-mq-loadbalancer -n default -o jsonpath="{..ip}")
  echo https://$CONSOLE_IP:$CONSOLE_PORT/ibmmq/console
The MQ connection information for clients inside the cluster is as follows:
  secureapphelm-ibm-mq:1414

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
secureapphelm-ibm-mq-0   1/1     Running   0          3m44s
secureapphelm-ibm-mq-1   0/1     Running   0          3m44s
secureapphelm-ibm-mq-2   0/1     Running   0          3m44s

$ oc logs secureapphelm-ibm-mq-0
2023-02-28T10:42:59.223Z CPU architecture: amd64
2023-02-28T10:42:59.223Z Linux kernel version: 5.4.0-1103-azure
2023-02-28T10:42:59.223Z Container runtime: kube
2023-02-28T10:42:59.223Z Base image: Red Hat Enterprise Linux 8.6 (Ootpa)
2023-02-28T10:42:59.223Z Running as user ID 1001 with primary group 0, and supplementary groups 0
2023-02-28T10:42:59.223Z Capabilities: none
2023-02-28T10:42:59.223Z seccomp enforcing mode: disabled
2023-02-28T10:42:59.223Z Process security attributes: cri-containerd.apparmor.d (enforce)
2023-02-28T10:42:59.223Z Detected 'ext4' volume mounted to /mnt/mqm
2023-02-28T10:42:59.368Z Using queue manager name: secureapphelm
2023-02-28T10:42:59.377Z Created directory structure under /var/mqm
2023-02-28T10:42:59.377Z Image created: 2022-10-10T15:04:48+00:00
2023-02-28T10:42:59.377Z Image tag: ibm-mqadvanced-server-dev:9.3.1.0-r1.20221010145937.4a9fd29-amd64
2023-02-28T10:42:59.445Z MQ version: 9.3.1.0
2023-02-28T10:42:59.445Z MQ level: p931-L220929.1
2023-02-28T10:42:59.445Z MQ license: Developer
2023-02-28T10:43:02.354Z Starting web server
2023-02-28T10:43:03.641Z Creating queue manager secureapphelm
2023-02-28T10:43:14.949Z Created queue manager
2023-02-28T10:43:15.135Z Removing existing ServiceComponent configuration
2023-02-28T10:43:15.136Z Starting queue manager
2023-02-28T10:43:04.646Z AMQ6287I: IBM MQ V9.3.1.0 (p931-L220929.1). [CommentInsert1(Linux 5.4.0-1103-azure (Cloud Pak Linux (x86-64 platform) 64-bit)), CommentInsert2(/opt/mqm (Installation1)), CommentInsert3(9.3.1.0 (p931-L220929.1))]
2023-02-28T10:43:04.645Z AMQ5051I: The queue manager task 'LOG-FORMAT' has started. [ArithInsert2(1), CommentInsert1(LOG-FORMAT)]
2023-02-28T10:43:04.650Z AMQ5051I: The queue manager task 'LOGGER-IO' has started. [ArithInsert2(1), CommentInsert1(LOGGER-IO)]
2023-02-28T10:43:04.841Z AMQ5051I: The queue manager task 'CHECKPOINT' has started. [ArithInsert2(1), CommentInsert1(CHECKPOINT)]
2023-02-28T10:43:04.938Z AMQ7467I: The oldest log file required to start queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:04.938Z AMQ7468I: The oldest log file required to perform media recovery of queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:04.939Z AMQ5041I: The queue manager task 'CHECKPOINT' has ended. [CommentInsert1(CHECKPOINT)]
2023-02-28T10:43:04.942Z AMQ5041I: The queue manager task 'LOG-FORMAT' has ended. [CommentInsert1(LOG-FORMAT)]
2023-02-28T10:43:04.945Z AMQ5041I: The queue manager task 'LOGGER-IO' has ended. [CommentInsert1(LOGGER-IO)]
2023-02-28T10:43:06.240Z AMQ5051I: The queue manager task 'LOG-FORMAT' has started. [ArithInsert2(1), CommentInsert1(LOG-FORMAT)]
2023-02-28T10:43:06.242Z AMQ5051I: The queue manager task 'LOGGER-IO' has started. [ArithInsert2(1), CommentInsert1(LOGGER-IO)]
2023-02-28T10:43:06.256Z AMQ7229I: 5 log records accessed on queue manager 'secureapphelm' during the log replay phase. [ArithInsert1(5), CommentInsert1(secureapphelm)]
2023-02-28T10:43:06.256Z AMQ7230I: Log replay for queue manager 'secureapphelm' complete. [ArithInsert1(5), CommentInsert1(secureapphelm)]
2023-02-28T10:43:06.257Z AMQ5051I: The queue manager task 'CHECKPOINT' has started. [ArithInsert2(1), CommentInsert1(CHECKPOINT)]
2023-02-28T10:43:06.262Z AMQ7231I: 0 log records accessed on queue manager 'secureapphelm' during the recovery phase. [CommentInsert1(secureapphelm)]
2023-02-28T10:43:06.263Z AMQ7232I: Transaction manager state recovered for queue manager 'secureapphelm'. [CommentInsert1(secureapphelm)]
2023-02-28T10:43:06.341Z AMQ7467I: The oldest log file required to start queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:06.341Z AMQ7468I: The oldest log file required to perform media recovery of queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:06.342Z AMQ7233I: 0 out of 0 in-flight transactions resolved for queue manager 'secureapphelm'. [CommentInsert1(secureapphelm)]
2023-02-28T10:43:06.350Z AMQ7467I: The oldest log file required to start queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:06.351Z AMQ7468I: The oldest log file required to perform media recovery of queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:06.547Z AMQ5037I: The queue manager task 'ERROR-LOG' has started. [ArithInsert2(1), CommentInsert1(ERROR-LOG)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(1), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(2), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(3), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(4), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(5), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(6), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(7), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:06.549Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(8), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:14.337Z AMQ8048I: Default objects statistics : 84 created. 0 replaced. 0 failed. [ArithInsert1(84), CommentInsert1(0)]
2023-02-28T10:43:14.349Z AMQ8003I: IBM MQ queue manager 'secureapphelm' started using V9.3.1.0. [CommentInsert1(9.3.1.0), CommentInsert3(secureapphelm)]
2023-02-28T10:43:14.443Z AMQ5041I: The queue manager task 'APP-SIGNAL' has ended. [CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:14.444Z AMQ5041I: The queue manager task 'APP-SIGNAL' has ended. [CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:14.447Z AMQ5041I: The queue manager task 'ERROR-LOG' has ended. [CommentInsert1(ERROR-LOG)]
2023-02-28T10:43:14.451Z AMQ5041I: The queue manager task 'CHECKPOINT' has ended. [CommentInsert1(CHECKPOINT)]
2023-02-28T10:43:14.741Z AMQ7467I: The oldest log file required to start queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:14.742Z AMQ7468I: The oldest log file required to perform media recovery of queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:14.752Z AMQ5041I: The queue manager task 'LOG-FORMAT' has ended. [CommentInsert1(LOG-FORMAT)]
2023-02-28T10:43:14.752Z AMQ5041I: The queue manager task 'LOGGER-IO' has ended. [CommentInsert1(LOGGER-IO)]
2023-02-28T10:43:14.835Z AMQ8004I: IBM MQ queue manager 'secureapphelm' ended. [CommentInsert3(secureapphelm)]
2023-02-28T10:43:15.237Z AMQ6206I: Command strmqm was issued. [CommentInsert1(strmqm), CommentInsert2(strmqm -x secureapphelm)]
2023-02-28T10:43:15.239Z AMQ5782I: File system containing '/mnt/mqm/data/qmgrs/secureapphelm' is 3.36% used, 1881MB free. [ArithInsert1(3.36), CommentInsert1(/mnt/mqm/data/qmgrs/secureapphelm), CommentInsert2(1946), CommentInsert3(1881)]
2023-02-28T10:43:15.240Z AMQ5782I: File system containing '/mnt/mqm/data/log/secureapphelm' is 2.53% used, 1897MB free. [ArithInsert1(2.53), CommentInsert1(/mnt/mqm/data/log/secureapphelm), CommentInsert2(1946), CommentInsert3(1897)]
2023-02-28T10:43:15.253Z AMQ5775I: Successfully applied automatic configuration INI definitions. [CommentInsert1(INI)]
2023-02-28T10:43:15.652Z Initializing MQ Advanced for Developers custom authentication service
2023-02-28T10:43:15.653Z mqhtpass: MQStart options=Primary qmgr=secureapphelm
2023-02-28T10:43:16.244Z mqhtpass: MQStart options=Secondary qmgr=secureapphelm
2023-02-28T10:43:16.549Z AMQ5051I: The queue manager task 'LOG-FORMAT' has started. [ArithInsert2(1), CommentInsert1(LOG-FORMAT)]
2023-02-28T10:43:16.552Z AMQ5051I: The queue manager task 'LOGGER-IO' has started. [ArithInsert2(1), CommentInsert1(LOGGER-IO)]
2023-02-28T10:43:16.654Z AMQ5051I: The queue manager task 'NATIVE-HA' has started. [ArithInsert2(1), CommentInsert1(NATIVE-HA)]
2023-02-28T10:43:17.842Z Started replica queue manager
2023-02-28T10:43:17.741Z AMQ7814I: IBM MQ queue manager running as replica instance 'secureapphelm-ibm-mq-0'. [CommentInsert2(secureapphelm-ibm-mq-0), CommentInsert3(secureapphelm)]
2023-02-28T10:43:17.835Z AMQ9722W: Plain text communication is enabled.
2023-02-28T10:43:17.854Z AMQ3212I: Native HA outbound secure connection established to 'secureapphelm-ibm-mq-1'. [CommentInsert1(secureapphelm-ibm-mq-1), CommentInsert2(secureapphelm-ibm-mq-replica-1(9414)), CommentInsert3(TLS_CHACHA20_POLY1305_SHA256)]
2023-02-28T10:43:17.855Z AMQ3235I: Native HA instance 'secureapphelm-ibm-mq-0' is not connected to enough other instances to start the process of selecting the active instance. [ArithInsert2(2), CommentInsert1(secureapphelm-ibm-mq-0), CommentInsert2(secureapphelm), CommentInsert3(Bootstrapping)]
2023-02-28T10:43:18.044Z AMQ3214I: Native HA inbound secure connection accepted from 'secureapphelm-ibm-mq-1'. [CommentInsert1(secureapphelm-ibm-mq-1), CommentInsert2(10.244.0.8), CommentInsert3(TLS_CHACHA20_POLY1305_SHA256)]
2023-02-28T10:43:18.835Z AMQ3208E: Native HA network connection to 'secureapphelm-ibm-mq-2' could not be established. [CommentInsert1(secureapphelm-ibm-mq-2), CommentInsert2(secureapphelm-ibm-mq-replica-2(9414)), CommentInsert3(rrcE_HOST_NOT_AVAILABLE - Remote host not available, retry later. (111) (0x6F) (secureapphelm-ibm-mq-replica-2 (9414)) (TCP/IP) (????))]
2023-02-28T10:43:29.935Z AMQ3208E: Native HA network connection to 'secureapphelm-ibm-mq-2' could not be established. [CommentInsert1(secureapphelm-ibm-mq-2), CommentInsert2(secureapphelm-ibm-mq-replica-2(9414)), CommentInsert3(rrcE_HOST_NOT_AVAILABLE - Remote host not available, retry later. (111) (0x6F) (secureapphelm-ibm-mq-replica-2 (9414)) (TCP/IP) (????))]
2023-02-28T10:43:39.955Z AMQ3208E: Native HA network connection to 'secureapphelm-ibm-mq-2' could not be established. [CommentInsert1(secureapphelm-ibm-mq-2), CommentInsert2(secureapphelm-ibm-mq-replica-2(9414)), CommentInsert3(rrcE_HOST_NOT_AVAILABLE - Remote host not available, retry later. (111) (0x6F) (secureapphelm-ibm-mq-replica-2 (9414)) (TCP/IP) (????))]
2023-02-28T10:43:50.047Z AMQ3208E: Native HA network connection to 'secureapphelm-ibm-mq-2' could not be established. [CommentInsert1(secureapphelm-ibm-mq-2), CommentInsert2(secureapphelm-ibm-mq-replica-2(9414)), CommentInsert3(rrcE_HOST_NOT_AVAILABLE - Remote host not available, retry later. (111) (0x6F) (secureapphelm-ibm-mq-replica-2 (9414)) (TCP/IP) (????))]
2023-02-28T10:43:54.444Z Started web server
2023-02-28T10:43:54.735Z AMQ3214I: Native HA inbound secure connection accepted from 'secureapphelm-ibm-mq-2'. [CommentInsert1(secureapphelm-ibm-mq-2), CommentInsert2(10.244.1.4), CommentInsert3(TLS_CHACHA20_POLY1305_SHA256)]
2023-02-28T10:43:54.938Z AMQ3212I: Native HA outbound secure connection established to 'secureapphelm-ibm-mq-2'. [CommentInsert1(secureapphelm-ibm-mq-2), CommentInsert2(secureapphelm-ibm-mq-replica-2(9414)), CommentInsert3(TLS_CHACHA20_POLY1305_SHA256)]
2023-02-28T10:43:55.150Z AMQ3215I: The local Native HA instance 'secureapphelm-ibm-mq-0' is now the active instance of queue manager 'secureapphelm'. [ArithInsert1(1), CommentInsert1(secureapphelm-ibm-mq-0), CommentInsert2(secureapphelm)]
2023-02-28T10:43:55.235Z AMQ3237I: Native HA instance 'secureapphelm-ibm-mq-1' is starting to rebase from extent S0000000.LOG using log data from the active instance. [ArithInsert1(583), ArithInsert2(1), CommentInsert1(secureapphelm-ibm-mq-1), CommentInsert2(secureapphelm-ibm-mq-0), CommentInsert3(S0000000.LOG)]
2023-02-28T10:43:55.335Z AMQ3237I: Native HA instance 'secureapphelm-ibm-mq-2' is starting to rebase from extent S0000000.LOG using log data from the active instance. [ArithInsert1(583), ArithInsert2(1), CommentInsert1(secureapphelm-ibm-mq-2), CommentInsert2(secureapphelm-ibm-mq-0), CommentInsert3(S0000000.LOG)]
2023-02-28T10:43:55.535Z AMQ7816I: IBM MQ queue manager 'secureapphelm' active instance 'secureapphelm-ibm-mq-0' has a quorum of synchronised replicas available. [CommentInsert2(secureapphelm-ibm-mq-0), CommentInsert3(secureapphelm)]
2023-02-28T10:43:55.652Z AMQ7229I: 7 log records accessed on queue manager 'secureapphelm' during the log replay phase. [ArithInsert1(7), CommentInsert1(secureapphelm)]
2023-02-28T10:43:55.652Z AMQ7230I: Log replay for queue manager 'secureapphelm' complete. [ArithInsert1(7), CommentInsert1(secureapphelm)]
2023-02-28T10:43:55.653Z AMQ5051I: The queue manager task 'CHECKPOINT' has started. [ArithInsert2(1), CommentInsert1(CHECKPOINT)]
2023-02-28T10:43:55.657Z AMQ7231I: 0 log records accessed on queue manager 'secureapphelm' during the recovery phase. [CommentInsert1(secureapphelm)]
2023-02-28T10:43:55.657Z AMQ7232I: Transaction manager state recovered for queue manager 'secureapphelm'. [CommentInsert1(secureapphelm)]
2023-02-28T10:43:56.747Z AMQ7467I: The oldest log file required to start queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:56.748Z AMQ7468I: The oldest log file required to perform media recovery of queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:56.748Z AMQ7233I: 0 out of 0 in-flight transactions resolved for queue manager 'secureapphelm'. [CommentInsert1(secureapphelm)]
2023-02-28T10:43:57.048Z mqhtpass: MQStart options=Secondary qmgr=secureapphelm
2023-02-28T10:43:56.959Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(1), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.035Z AMQ5037I: The queue manager task 'ERROR-LOG' has started. [ArithInsert2(1), CommentInsert1(ERROR-LOG)]
2023-02-28T10:43:57.035Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(5), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.035Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(2), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.035Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(3), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.035Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(4), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.036Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(6), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.037Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(7), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.038Z AMQ5037I: The queue manager task 'APP-SIGNAL' has started. [ArithInsert2(8), CommentInsert1(APP-SIGNAL)]
2023-02-28T10:43:57.646Z mqhtpass: MQStart options=Secondary qmgr=secureapphelm
2023-02-28T10:43:57.661Z mqhtpass: MQStart options=Secondary qmgr=secureapphelm
2023-02-28T10:43:57.749Z mqhtpass: MQStart options=Secondary qmgr=secureapphelm
2023-02-28T10:43:57.752Z mqhtpass: mqhtpass_authenticate_user without CSP user set.  effectiveuid=mqm      env=0, callertype=1, type=0, accttoken=2025185812 applidentitydata=2025185844
2023-02-28T10:43:57.752Z mqhtpass: mqhtpass_authenticate_user without CSP user set.  effectiveuid=mqm      env=3, callertype=1, type=0, accttoken=1947689492 applidentitydata=1947689524
2023-02-28T10:43:57.636Z AMQ3238I: Native HA instance 'secureapphelm-ibm-mq-1' completed a log rebase. [ArithInsert2(1), CommentInsert1(secureapphelm-ibm-mq-1)]
2023-02-28T10:43:57.640Z AMQ7467I: The oldest log file required to start queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:57.640Z AMQ7468I: The oldest log file required to perform media recovery of queue manager secureapphelm is S0000000.LOG. [CommentInsert1(secureapphelm), CommentInsert2(S0000000.LOG)]
2023-02-28T10:43:57.649Z AMQ8003I: IBM MQ queue manager 'secureapphelm' started using V9.3.1.0. [CommentInsert1(9.3.1.0), CommentInsert3(secureapphelm)]
2023-02-28T10:43:57.663Z AMQ5051I: The queue manager task 'DUR-SUBS-MGR' has started. [ArithInsert2(1), CommentInsert1(DUR-SUBS-MGR)]
2023-02-28T10:43:57.663Z AMQ9410I: Repository manager started.
2023-02-28T10:43:57.674Z AMQ5051I: The queue manager task 'TOPIC-TREE' has started. [ArithInsert2(1), CommentInsert1(TOPIC-TREE)]
2023-02-28T10:43:57.688Z AMQ5051I: The queue manager task 'IQM-COMMS-MANAGER' has started. [ArithInsert2(1), CommentInsert1(IQM-COMMS-MANAGER)]
2023-02-28T10:43:57.705Z AMQ5024I: The command server has started. ProcessId(773). [ArithInsert1(773), CommentInsert1(SYSTEM.CMDSERVER.1)]
2023-02-28T10:43:57.706Z AMQ5022I: The channel initiator has started. ProcessId(774). [ArithInsert1(774), CommentInsert1(SYSTEM.CHANNEL.INITQ)]
2023-02-28T10:43:57.707Z AMQ5051I: The queue manager task 'AUTOCONFIG' has started. [ArithInsert2(1), CommentInsert1(AUTOCONFIG)]
2023-02-28T10:43:57.752Z AMQ8942I: Starting to process automatic MQSC configuration script.
2023-02-28T10:43:57.756Z AMQ8024I: IBM MQ channel initiator started. [CommentInsert1(SYSTEM.CHANNEL.INITQ)]
2023-02-28T10:43:58.358Z Starting metrics gathering
2023-02-28T10:43:58.185Z mqhtpass: MQStart options=Secondary qmgr=secureapphelm
2023-02-28T10:43:58.267Z mqhtpass: MQStart options=Secondary qmgr=secureapphelm
2023-02-28T10:43:58.360Z mqhtpass: mqhtpass_authenticate_user without CSP user set.  effectiveuid=mqm      env=0, callertype=1, type=0, accttoken=1757340180 applidentitydata=1757340212
2023-02-28T10:43:58.163Z AMQ8939I: Automatic MQSC configuration script has completed, and contained 33 command(s), of which 0 had errors. [ArithInsert1(33), CommentInsert1(0)]
2023-02-28T10:43:58.164Z AMQ5051I: The queue manager task 'MEDIA-IMAGES' has started. [ArithInsert2(1), CommentInsert1(MEDIA-IMAGES)]
2023-02-28T10:43:58.165Z AMQ5052I: The queue manager task 'QPUBSUB-QUEUE-NLCACHE' has started. [ArithInsert2(1), CommentInsert1(QPUBSUB-QUEUE-NLCACHE)]
2023-02-28T10:43:58.165Z AMQ5026I: The listener 'SYSTEM.LISTENER.TCP.1' has started. ProcessId(804). [ArithInsert1(804), CommentInsert1(SYSTEM.LISTENER.TCP.1)]
2023-02-28T10:43:58.165Z AMQ5052I: The queue manager task 'QPUBSUB-CTRLR' has started. [ArithInsert2(1), CommentInsert1(QPUBSUB-CTRLR)]
2023-02-28T10:43:58.167Z AMQ5052I: The queue manager task 'QPUBSUB-SUBPT-NLCACHE' has started. [ArithInsert2(1), CommentInsert1(QPUBSUB-SUBPT-NLCACHE)]
2023-02-28T10:43:58.168Z AMQ5052I: The queue manager task 'PUBSUB-DAEMON' has started. [ArithInsert2(1), CommentInsert1(PUBSUB-DAEMON)]
2023-02-28T10:43:58.168Z AMQ5975I: 'IBM MQ Distributed Pub/Sub Controller' has started. [CommentInsert1(IBM MQ Distributed Pub/Sub Controller)]
2023-02-28T10:43:58.170Z AMQ5975I: 'IBM MQ Distributed Pub/Sub Fan Out Task' has started. [CommentInsert1(IBM MQ Distributed Pub/Sub Fan Out Task)]
2023-02-28T10:43:58.171Z AMQ5975I: 'IBM MQ Distributed Pub/Sub Command Task' has started. [CommentInsert1(IBM MQ Distributed Pub/Sub Command Task)]
2023-02-28T10:43:58.173Z AMQ5975I: 'IBM MQ Distributed Pub/Sub Publish Task' has started. [CommentInsert1(IBM MQ Distributed Pub/Sub Publish Task)]
2023-02-28T10:43:58.174Z AMQ5037I: The queue manager task 'DEFERRED-MSG' has started. [ArithInsert2(1), CommentInsert1(DEFERRED-MSG)]
2023-02-28T10:43:58.174Z AMQ5037I: The queue manager task 'DEFERRED_DELIVERY' has started. [ArithInsert2(1), CommentInsert1(DEFERRED_DELIVERY)]
2023-02-28T10:43:58.175Z AMQ5037I: The queue manager task 'STATISTICS' has started. [ArithInsert2(1), CommentInsert1(STATISTICS)]
2023-02-28T10:43:58.176Z AMQ5037I: The queue manager task 'MARKINTSCAN' has started. [ArithInsert2(1), CommentInsert1(MARKINTSCAN)]
2023-02-28T10:43:58.241Z AMQ5051I: The queue manager task 'ACTVTRC' has started. [ArithInsert2(1), CommentInsert1(ACTVTRC)]
2023-02-28T10:43:58.242Z AMQ5051I: The queue manager task 'MULTICAST' has started. [ArithInsert2(1), CommentInsert1(MULTICAST)]
2023-02-28T10:43:58.244Z AMQ5051I: The queue manager task 'RESOURCE_MONITOR' has started. [ArithInsert2(1), CommentInsert1(RESOURCE_MONITOR)]
2023-02-28T10:43:58.245Z AMQ5051I: The queue manager task 'LOGGEREV' has started. [ArithInsert2(1), CommentInsert1(LOGGEREV)]
2023-02-28T10:43:58.250Z AMQ5051I: The queue manager task 'ASYNCQ' has started. [ArithInsert2(1), CommentInsert1(ASYNCQ)]
2023-02-28T10:43:58.250Z AMQ5051I: The queue manager task 'EXPIRER' has started. [ArithInsert2(1), CommentInsert1(EXPIRER)]
2023-02-28T10:43:58.250Z AMQ5051I: The queue manager task 'PRESERVED-Q' has started. [ArithInsert2(1), CommentInsert1(PRESERVED-Q)]
2023-02-28T10:43:58.251Z AMQ5051I: The queue manager task 'Q-DELETION' has started. [ArithInsert2(1), CommentInsert1(Q-DELETION)]
2023-02-28T10:43:58.308Z AMQ5806I: Queued Publish/Subscribe Daemon started for queue manager secureapphelm. [CommentInsert1(secureapphelm)]
2023-02-28T10:44:00.907Z AMQ3238I: Native HA instance 'secureapphelm-ibm-mq-2' completed a log rebase. [ArithInsert2(1), CommentInsert1(secureapphelm-ibm-mq-2)]
2023-02-28T10:44:16.241Z AMQ5041I: The queue manager task 'AUTOCONFIG' has ended. [CommentInsert1(AUTOCONFIG)]

I checked the content of your files and agree the only difference is the image being pulled. This leaves two options for where the error is likely to be located:

  1. Something is wrong with the image you are pulling. One option to verify would be to pull the image directly from IBM to verify
  2. There is some network policy, or similar capability causing issues in your AKS environment. Did you use the sample to deploy the AKS environment or was it setup separately. It would be interesting to see if you see the same issue if you deploy a new AKS cluster.
JBarbes commented 1 year ago

Hi Callum - thank you for looking into this.

I have tried to create a standard queue manager with the same Helm chart and the same image. This starts up without errors (havn't tried to use it yet though).

But I think you are right that it is an AKS policy issue. I'm no geek on AKS yet, but I understand that in this scenario it is private, and can't be easily reached from the outside the customer network. Any idea what I should be looking at to address this in relation to nativeHA?

As the standard queue manager didn't fail, I got a change to see that the loadbalancer did not start. It failed with the following error: message: >- Error syncing load balancer: failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {"error":{"code":"RequestDisallowedByPolicy","target":"kubernetes","message":"Resource 'kubernetes' was disallowed by policy. Policy identifiers: '[{\"policyAssignment\":{\"name\":\"Load balancer - Public IP is not allowed\",\"id\":\"/providers/Microsoft.Management/managementGroups/dsv/providers/Microsoft.Authorization/policyAssignments/af96c2e041ff4e7899b1323f\"},\"policyDefinition\":{\"name\":\"Load balancer - Public IP is not allowed\",\"id\":\"/providers/Microsoft.Management/managementGroups/dsv/providers/Microsoft.Authorization/policyDefinitions/225306fd-fae8-4028-8e48-fa0134f84cd2\"}}]'.","additionalInfo":[{"type":"PolicyViolation","info":{"evaluationDetails":{"evaluatedExpressions":[{"result":"True","expressionKind":"Field","expression":"type","path":"type","expressionValue":"Microsoft.Network/loadBalancers","targetValue":"Microsoft.Network/loadBalancers","operator":"Equals"},{"result":"True","expressionKind":"Field","expression":"Microsoft.Network/loadBalancers/frontendIPConfigurations[].publicIPAddress","path":"properties.frontendIPConfigurations[].properties.publicIPAddress","expressionValue":"**","targetValue":"True","operator":"Exists"}]},"policyDefinitionId":"/providers/Microsoft.Management/managementGroups/dsv/providers/Microsoft.Authorization/policyDefinitions/225306fd-fae8-4028-8e48-fa0134f84cd2","policyDefinitionName":"225306fd-fae8-4028-8e48-fa0134f84cd2","policyDefinitionDisplayName":"Load balancer - Public IP is not allowed","policyDefinitionEffect":"Deny","policyAssignmentId":"/providers/Microsoft.Management/managementGroups/dsv/providers/Microsoft.Authorization/policyAssignments/af96c2e041ff4e7899b1323f","policyAssignmentName":"af96c2e041ff4e7899b1323f","policyAssignmentDisplayName":"Load balancer - Public IP is not allowed","policyAssignmentScope":"/providers/Microsoft.Management/managementGroups/dsv","policyAssignmentParameters":{}}}]}}

callumpjackson commented 1 year ago

I would start by looking at Kubernetes Network Policies. There is a nice document on this here.

JBarbes commented 1 year ago

Callum, I suddenly see that nativeHA uses the public address' (172.) and not internal cluster IP (100.). I think that is wrong as the cluster is private and only very few things are open. How can I change that ?

callumpjackson commented 1 year ago

NativeHA will use the environment variables:

To control the network location for the internal data replication. You can see these being set in the statefulSet here. These should be set to the ClusterIP service name, which are defined here. Its unclear to me how these could be using a public address as they are only ClusterIP entries. Where did you see the public address being used?

JBarbes commented 1 year ago

I see it in the error message:

AMQ3230E: Native HA network connection attempt from network address '172.20.88.197' failed. [CommentInsert1(172.20.88.197), CommentInsert2(secureapphelm), CommentInsert3(rrcE_SSL_BAD_KEYFILE_LABEL (407) (????) (ha) ())]

image

JBarbes commented 1 year ago

I think the Services are defined correctly with ClusterIP. But why does the pods try to connect over the 172-address?

spec: ports:

callumpjackson commented 1 year ago

I think the IP address is not the underlying issue. I think both the 172 and 100 ranges are private IP address ranges, and the difference is due to DNS lookup in the Kubernetes cluster. The underlying issue is that the key store being created by the container is not correct. This code is within the container repo . The particular piece of code creating the key store has change over the last few months, so I did re-verify that the latest available build, ibm-mqadvanced-server-dev:9.3.2.0-r1.20230209173815.53790e1-amd64, works successfully. My best idea at the moment is that the container image you are using, within your private repo, is broken for some reason. I would suggest that you import the latest or re-build based on the 9.3.2 branch.

JBarbes commented 1 year ago

Thank you for your support, Callum With the new v9.2.3 dev images it works like a charm :)

callumpjackson commented 1 year ago

Great - closing issue.