ibm-messaging / mq-container

Container images for IBM® MQ
Apache License 2.0
247 stars 186 forks source link

SSL handshake failure when sending/receiving after IBM MQ container is started from a stopped state #571

Open ctslone opened 2 months ago

ctslone commented 2 months ago

I am currently using the icr.io/ibm-messaging/mq:9.3.2.0-r2 version of the IBM MQ docker image as part of a test suite. The container is created and managed via a docker-compose YAML file on disk and gets created via the docker compose up --wait command where it runs the following health check:

healthcheck:
  test: ["CMD-SHELL", "chkmqready && chkmqhealthy"]
  interval: 5s
  timeout: 3s
  retries: 100

This health check will pass and the test suite will run and all tests will pass. However, at the end of the suite, the container is stopped so that it can be used again the next time the test suite runs and can just be started from its stopped/exited state. When stopped, I have confirmed it is stopped with an exit status code of 0.

When running the suite after it has already run once and created the container, the test suite checks for the existing container on disk, if it already exists (as it would be in a stopped/exited state), it is started via the docker compose up --no-recreate --wait command. The issue is that when the tests run this time (after starting from a stopped state), the tests that perform a test connection or sending/receiving to/from a queue will display an error with the SSL handshake:

Cannot conclude ssl handshake. Cause: Software caused connection abort: recv failed.

I have looked in the queue manager logs located at /var/mqm/qmgrs/QMGRNAME/errors/AMQERR01.LOG and see the following error messages in the logs:

AMQ9209E: Connection to host '_gateway (X)' for channel 'DEV.APP.SVRCONN' closed.

EXPLANATION: An error occurred receiving data from '_gateway (X)' over TCP/IP. The connection to the remote host has unexpectedly terminated.

...and...

AMQ9999E: Channel 'DEV.APP.SVRCONN' to host 'X' ended abnormally.

EXPLANATION: The channel program running under process ID 444 for channel 'DEV.APP.SVRCONN' ended abnormally. The host name is 'X'; in some cases the host name cannot be determined and so is shown as '????'.

I've spent a bunch of time trying to figure out why the container can run the tests just fine when spinning up brand new but fails when starting from a stopped state, but haven't had any success and figured I would ask here.

Is there anything special that needs to be done when stopping this container? Or anything special that needs to be done when starting it from a stopped state?

vgavinash commented 2 months ago

Please see if the following pdf from IBM support can provide any insights about the issue occurring - https://www.ibm.com/support/pages/system/files/inline-files/IBM%20MQ%20error%20AMQ9209E%20Connection%20to%20host%20x%20for%20channel%20y%20closed_0.pdf

ctslone commented 2 months ago

Hi @vgavinash I had reviewed that document before opening this issue. Unfortunately, nothing in there was able to help me resolve the issue. It doesn't seem this error is directly related to starting the container from a stopped state, because I see the same errors in the logfile when starting the container from scratch - but something about starting the container from the stopped state results in connections being closed.

Any other ideas here would be appreciated.

vgavinash commented 1 month ago

Hi @ctslone, I assume healthcheck does an ssh on the running container and runs the chkmqready & chkmqhealthy commands within the mq-container. If this is the case, without the docker compose can you try building the container image directly via this procedure - https://github.com/ibm-messaging/mq-container/blob/master/docs/building.md#building-a-developer-image.

And later stop and start the container using docker stop / docker start commands. And try your healthcheck routine each time.