Open kalekhin opened 3 years ago
The problem is solved if I provide all the processor cores for the linux virtual machine where docker is installed, or if I setup the paravirtualization interface for this machine as hyper-v.
In the first solution, the CPU load is 100% already by the java process in the mq container. But restarting the container has a chance to solve the problem (only a chance). The second solution looks more stable. So far, there have been no problems
I am not sure that any solution is stable.
Same problem here :( Im running the container on Ubuntu 20.04 LTS, AMD Ryzen 7 5800, Lenovo Legion 5 Pro.
I have a similar situation - 100% CPU load by gsk8capicmd_64: my hardware/software set:: AMD Ryzen 7 5800H Windows 10.0.19042.1237 with WSL 2 with core version 5.10.16 Docker 20.10.8 build 3967b7d
AMD Ryzen 7 5800H (Lenovo Legion 5) Fedora 36 (kernel 5.17.12-300.fc36.x86_64), Docker Desktop 4.9.0 (docker 20.10.16)
same issue: /opt/mqm/gskit8/bin/gsk8capicmd_64 -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw JtM27EP7L2LH -stash
loads CPU 100% and takes from 6 to 40 minutes (randomly)
@arthurbarr could you escalate this? this is productivity killer for developers working on AMD Ryzen setups, I guess original issue coming from some GSKit8 bug
Hi @andreysaksonov , @mihmig , @jfmatheusg , @kalekhin .
Arthur has asked me to look into this. The gsk8capicmd is owned by a separate internal IBM team to IBM MQ but i can raise a support ticket with them to ask them to take a look. To help them diagnose the issue they are likely to want trace of the issue.
Please could you run the same commands as before that caused the 100% CPU issue with the -trace <file>
option. For example, taking @andreysaksonov 's command i would run: /opt/mqm/gskit8/bin/gsk8capicmd_64 -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw JtM27EP7L2LH -stash -trace /tmp/trace.output
.
Please then send me the file generated; in the previous example i would need /tmp/trace.output
.
You can send the file by either attaching it to a comment here or directly via email to parrobe@uk.ibm.com
.
Could you also let me know what version of MQ you are using, this is best done via the dspmqver command and you can also tell me directly what version of GSKit you are using via dspmqver -p 65
In the meantime I'll get the ball rolling with GSKit and hopefully we can get to the bottom of this.
@parrobe
docker rm ibmmq && docker run -e LICENSE=accept -e DEBUG=true -e MQ_QMGR_NAME=QM1 -p 9443:9443 --name ibmmq icr.io/ibm-messaging/mq:latest
❯ docker exec -it ibmmq /bin/bash
bash-4.4$ ps auxwwf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1001 49 0.1 0.0 35120 4284 pts/0 Ss 14:45 0:00 /bin/bash
1001 55 0.0 0.0 47616 3544 pts/0 R+ 14:45 0:00 \_ ps auxwwf
1001 1 0.5 0.1 1290456 14472 ? Ssl 14:45 0:00 runmqserver -nologruntime -dev
1001 43 0.0 0.0 34988 4068 ? S 14:45 0:00 /bin/sh /opt/mqm/bin/runmqakm -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash
1001 48 100 0.0 46816 11652 ? R 14:45 0:15 \_ /opt/mqm/gskit8/bin/gsk8capicmd_64 -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash
bash-4.4$ /bin/sh /opt/mqm/bin/runmqakm -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash -trace /tmp/zxDekBysHQ7S.trace
bash-4.4$ date
Tue Jun 7 14:46:05 UTC 2022
bash-4.4$ /bin/sh /opt/mqm/bin/runmqakm -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash -trace /tmp/zxDekBysHQ7S.trace
CTGSK3036W The output file "/run/runmqserver/tls/key.kdb" already exists.
bash-4.4$ date
Tue Jun 7 14:46:22 UTC 2022
bash-4.4$ exit
exit
❯ docker cp ibmmq:/tmp/zxDekBysHQ7S.trace .
❯ docker exec -it ibmmq /bin/bash
bash-4.4$ dspmqver
Name: IBM MQ
Version: 9.2.5.0
Level: p925-L220207-CSU01-L220405.DE
BuildType: IKAP - (Production)
Platform: IBM MQ for Linux (x86-64 platform)
Mode: 64-bit
O/S: Linux 5.10.104-linuxkit
O/S Details: Red Hat Enterprise Linux 8.6 (Ootpa)
InstName: Installation1
InstDesc: IBM MQ V9.2.5.0 (Unzipped)
Primary: N/A
InstPath: /opt/mqm
DataPath: /mnt/mqm/data
MaxCmdLevel: 925
LicenseType: Developer
bash-4.4$ dspmqver -p 65
Name: IBM MQ
Version: 9.2.5.0
Level: p925-L220207-CSU01-L220405.DE
BuildType: IKAP - (Production)
Platform: IBM MQ for Linux (x86-64 platform)
Mode: 64-bit
O/S: Linux 5.10.104-linuxkit
O/S Details: Red Hat Enterprise Linux 8.6 (Ootpa)
InstName: Installation1
InstDesc: IBM MQ V9.2.5.0 (Unzipped)
Primary: N/A
InstPath: /opt/mqm
DataPath: /mnt/mqm/data
MaxCmdLevel: 925
LicenseType: Developer
AMQ8250I: The 32-bit GSKit component is not installed.
Name: IBM Global Security Kit for IBM MQ
Version: 8.0.55.26
BuildType: Production
Mode: 64-bit
bash-4.4$
As you can see drama of the situation is that when it is not spawned by runmqserver -nologruntime -dev
but instead I run it from new shell in container - the command does not hang. Attached trace file anyway
Thanks @andreysaksonov - I've passed these details onto the GSKit team. I will let you know when they have responded.
Hi @andreysaksonov - We've heard back from GSKit now. They suspect this is an issue they have seen with some AMD processors where their RNG module hangs due to a diference in the AMD chips clock. They have asked if we can retry with the following environment variable set as a workaround to see if the issue resolves:
Please set ICC_SHIFT=3
when creating your container so it is present for the container startup. Please run trace again if the issue has not resolved.
WARNING: For anyone reading this issue and solution who is seeing a similar issue to above, please do not set the environment variable unless specifically advised to do so as while the variable may resolve the issue, it can negatively impact performance or functionality if the issue it is trying to resolve is not the cause for your particular problem.
Yes, it solves the issue, thanks. I will leave link to original GSKit bug: https://www.ibm.com/support/pages/apar/IJ28497
We've started hitting this in App Connect in containers too.
When considering the comment above from parrobe
, (that basically says "you must set this to fix the problem, but must not set it if you don't see the problem) we've identified an interesting problem when running in a cluster. Initially, one might think that it's safe to set ICC_SHIFT
on a cluster, and so we could add a config parameter for an operand so that users can selectively set ICC_SHIFT
But the cluster may have a mix of hardware for the workers. Some may use AMD chips susceptible to this problem, some may use Intel chips. This would mean that it depends what worker a Pod lands on as to whether the user should set the config parameter or not, and that Pod placement is, by default, random
I understand that there has been an update (fixpack) to the IBM GSKit component that addresses this cpu specific-issue, so you could contact IBM support to get the updated GSKit and apply it to your estate, or instead wait for the next fixpack update to you rmq component(s) to include the updated GSKit . Hopefully that would then mean a permanent fix without needing workarounds like ICC_SHIFT=3
Hi @imavo - IBM MQ aims to update all our thirdparty component versions to the latest with each released version of IBM MQ. Unfortunately, there are delays taking the latest GSKit version due to issues found through our regression testing. We will update as soon as we are able.
Hello!
When my laptop is connected to a power source, the gskcapicmd_64 call lasts forever, and one thread uses 100% of the processor. This is fixed if the laptop is powered by a battery. I don't understand why and how this is related.
Reproduced on ROG Zephyrus G15 GA503 GA503QM-HN094 (AMD Ryzen 7 5800HS with Radeon). Env: Win 10 Pro 20H2 build 19042.1110 with all updates for 17/07/2021 Docker toolbox v19.03.1 is running on a virtual box 6.1.22 r144080 (Qt5.6.2) with the extension package 6.1.22 r144080.
Just run:
And the bug will happen.
https://github.com/ibm-messaging/mq-container/blob/4580cecf4973107dff184e8cbbcf9ac7f5b4e7df/internal/keystore/keystore.go#L192