Closed xwgao closed 11 months ago
cc @PeterF778 in case you have any ideas
@xwgao Are there any error messages produced by the agent? They go to stderr
by default, which is then directed by WebSphere to a file by default.
If there are no errors, please try using -Dotel.javaagent.debug=true
to get more verbose logging.
This is probably not causing this problem, but your rule WebSphere:feature=wasJmsServer,type=Queue,name=*
has a wildcard and matches multiple MBeans, so you should report name
as a metric attribute. Otherwise the values for multiple MBeans will clash and make your metric non-deterministic.
@PeterF778 How can I report name
as a metric attribute? Can I use the below configuration rule to do that?
---
rules:
- bean: WebSphere:feature=wasJmsServer,type=Queue,name=*
metricAttribute:
name: param(name)
mapping:
Depth:
metric: my.own.jms.queue.depth
type: updowncounter
desc: The JMS message queue depth
unit: "1"
- bean: java.lang:type=Threading
mapping:
ThreadCount:
metric: my.own.jvm.thread.count
type: updowncounter
desc: The current number of threads
unit: "1"
Actually, when I removed the first rule for bean WebSphere:feature=wasJmsServer,type=Queue,name=*
, the result is the same: After applying the jmx-metrics java agent, I can not find the queue beans from the bean list retrieved using the jmxterm tool and I can not see the metrics my_own_jms_queue_depth
generated for the queue depth of my liberty queue beans.
And I've added -Dotel.javaagent.debug=true
and the verbose logs are given in attached file verbose.log. There are some error messages (the lines starting with [err]
).
verbose.log
@xwgao instead of Depth
try using MaxQueueDepth
. It seems that metrics with value 0
don't show up in prometheus output (but you can see them if you turn on logging in the collector). Using MaxQueueDepth
should make this easier to debug for you, change it back after you have verified it works. Try
MaxQueueDepth:
metric: my.own.jms.queue.depth
type: updowncounter
desc: The JMS message queue depth
unit: "1"
metricAttribute:
name: param(name)
@xwgao Yes, your updated configuration looks right (as well as the configuration proposed by @laurit ).
From the symptoms, it looks like the MBean responsible for the WebSphere metrics somehow got crippled when queried for matching ObjectNames. It is strange that it did not produce any error messages or exceptions - at least I do not see any in the attached log file. BTW, the lines starting with [err]
indicate captured stderr
entries, not necessarily actual errors.
Are there any other log files produced by your server?
@PeterF778 I tried with the configuration proposed by @laurit. The result is the same:
After applying the jmx-metrics java agent, I can not find the queue beans from the bean list retrieved using the jmxterm tool and I can not see the metrics my_own_jms_queue_depth generated.
Actually, as I stated, when I removed the first rule for bean WebSphere:feature=wasJmsServer,type=Queue,name=*
, the result is the same.
I found another log file messages.log
generated by my liberty server but it looks the same as the one I sent to you already.
messages.log
verbose.20231108.log
I found the following messages from the log files, is this related to the issue?
[otel.javaagent 2023-11-07 07:18:36:488 +0000] [main] DEBUG io.opentelemetry.javaagent.tooling.SafeServiceLoader - Unable to load instrumentation class: JVMCFRE199E bad major version 61.0 of class=io/opentelemetry/javaagent/shaded/instrumentation/javaagent/runtimemetrics/java17/Java17RuntimeMetricsInstaller, the maximum supported major version is 55.0; offset=6
...
[otel.javaagent 2023-11-07 07:18:37:308 +0000] [main] DEBUG io.opentelemetry.javaagent.tooling.SafeServiceLoader - Unable to load instrumentation class: JVMCFRE199E bad major version 61.0 of class=io/opentelemetry/javaagent/instrumentation/spring/jms/v6_0/SpringJmsInstrumentationModule, the maximum supported major version is 55.0; offset=6
[otel.javaagent 2023-11-07 07:18:37:349 +0000] [main] DEBUG io.opentelemetry.javaagent.tooling.SafeServiceLoader - Unable to load instrumentation class: JVMCFRE199E bad major version 61.0 of class=io/opentelemetry/javaagent/instrumentation/spring/webmvc/v6_0/SpringWebMvcInstrumentationModule, the maximum supported major version is 55.0; offset=6
Unable to load instrumentation class: JVMCFRE199E bad major version 61.0
is expected. Some instrumentations target java 17, loading them is expected to fail on jdk 11.
To help you we need to reproduce this issue and we need your help to do that. I tried with wlp-jakartaee10-23.0.0.10
with default configuration and can see the metric for the default queue. You'll have to narrow your issue down, try with a fresh liberty install with default configuration and see if it works then. If it does not try with a openjdk instead of openj9. If you manage to find the combination that fails describe it with enough detail to allow others to reproduce
@laurit @PeterF778 I tried upgrading my WebSphere Liberty to 23.0.0.10
and java to 17
. Although the error Unable to load instrumentation class: JVMCFRE199E bad major version 61.
was gone, the result was the same:
After applying the jmx-metrics java agent,
my_own_jms_queue_depth
generated.Attached is the verbose log of my Liberty server after upgrading. verbose.20231114.log
In addition, my Liberty JMS server is running inside docker container which uses the base image ibmcom/websphere-liberty:23.0.0.10-kernel-java17-openj9-ubi
.
@xwgao could you provide instructions for reproducing this? Perhaps you could provide a dockerfile that would allow building an image that reproduces the issue?
This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed automatically if there is no response from the author within 7 additional days from this comment.
@laurit The instructions for reproducing this issue are as follows: In OpenShift 4.12.36 cluster,
kind: ConfigMap
apiVersion: v1
metadata:
name: otel-jmx-config
namespace: my-namespace
data:
config.yaml: |
---
rules:
- bean: WebSphere:feature=wasJmsServer,type=Queue,name=*
metricAttribute:
name: param(name)
mapping:
Depth:
metric: my.own.jms.queue.depth
type: updowncounter
desc: The JMS message queue depth
unit: "1"
- bean: java.lang:type=Threading
mapping:
ThreadCount:
metric: my.own.jvm.thread.count
type: updowncounter
desc: The current number of threads
unit: "1"
instrumentation.opentelemetry.io/inject-java: instrumentation
as below.
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: wlp-jmsserver-statefulset
namespace: my-namespace
...
spec:
replicas: 1
...
template:
...
annotations:
instrumentation.opentelemetry.io/inject-java: instrumentation
...
spec:
...
containers:
- name: jmsserver
command:
- /bin/bash
- '-c'
- '--'
env:
...
- name: JAVA_TOOL_OPTIONS
value: ' -Dotel.jmx.config=/config/manage/oteljmx/config.yaml -Dotel.javaagent.debug=true'
ports:
- containerPort: 9080
protocol: TCP
- containerPort: 9443
protocol: TCP
- containerPort: 7276
protocol: TCP
- containerPort: 7286
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
...
- name: otel-jmx-config
mountPath: /config/manage/oteljmx
...
args:
- >-
echo copy additional jvm options is exists; /config/fips-check.sh;
cp
/opt/ibm/wlp/usr/servers/defaultServer/manage/jvmoptions/jvm.options
/opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/.;
echo generateKeyStore...; openssl pkcs12 -export -in
/etc/ssl/certs/internal-manage-tls/tls.crt -inkey
/etc/ssl/certs/internal-manage-tls/tls.key -out ${java_keystore}
-password pass:${java_keystore_password}; /tmp/startwlp.sh;
...
volumes:
...
- name: otel-jmx-config
configMap:
name: otel-jmx-config
defaultMode: 420
...
...
The image used in my StatefulSet can be built with the Dockerfile contained in the attached wlp-jms-image.zip file. You can unzip the attached zip file and then run build -t <image_tag> .
within the extracted folder. Since the base image is ibmcom/websphere-liberty:23.0.0.10-kernel-java17-openj9-ubi. You need to docker login docker.io before you run docker build command to build the image.
wlp-jms-image.zip
Please feel free to let me know if you meet any problem or need any further information. Thanks a lot.
@xwgao is the OpenShift really needed? Would it still reproduce if you'd just build the image and copy the agent into it? Having a simple reproducer that requires minimal effort to set up greatly increases the chances of someone trying it out.
@laurit Yes, it should be still reproducible if you'd just build the image and copy the agent into it. Thanks.
@laurit Are you able to reproduce by building the image and copying the agent jar into it? Please feel free to let me know if you met any problem or need any help. Thanks a lot.
@laurit @PeterF778 Is there any update for this issue? Thanks a lot.
@xwgao can you create a smaller example that can be run locally in docker?
@trask Which example do you need? You can start the WebSphere Liberty server via the /tmp/startwlp.sh
(contained in the docker image already) when running locally in docker.
@trask @laurit Is there any update for this issue? Thanks a lot.
@xwgao Try using JVM_ARGS
instead of JAVA_TOOL_OPTIONS
or add -Djavax.management.builder.initial=com.ibm.ws.kernel.boot.jmx.internal.PlatformMBeanServerBuilder
to jvm arguments.
Can be reproduced without otel agent by adding -Dcom.sun.management.jmxremote
to JAVA_TOOL_OPTIONS
. This property is set in ws-javaagent.jar
agent https://github.com/OpenLiberty/open-liberty/blob/f961357954b2610a98f1f4b5b6d5d416eecd1ea0/dev/com.ibm.ws.kernel.instrument.agent/src/com/ibm/ws/kernel/instrument/BootstrapAgent.java#L123 I think the issue here is that when our agent is in JAVA_TOOL_OPTIONS
it runs before the ws-javaagent.jar
.
@laurit It works after adding -Djavax.management.builder.initial=com.ibm.ws.kernel.boot.jmx.internal.PlatformMBeanServerBuilder
into JAVA_TOOL_OPTIONS
. Thanks a lot!
I can get my queue depth metrics my_own_jms_queue_depth
generated by following @laurit 's comment above. This issue is resolved now, so close it.
Thank you, @laurit , for taking the time and effort to investigate this ticket.
Describe the bug
I hit a problem when using jmx-metrics java agent in my Liberty JMS server to capture the queue depth metrics. I created a configuration rule yaml file as below:
And specify the env var
JAVA_TOOL_OPTIONS
to be-Dotel.jmx.config=/config/manage/oteljmx/config.yaml -javaagent:path/to/opentelemetry-javaagent.jar
. Then after my liberty JMS server started, I can not find the queue beans from the bean list (retrieved using the jmxterm tool). However, before applying the JMX metrics java agent, I can find the queue beans from the bean list via the jmxterm tool as below.In addition, for the other configuration rule defined for mapping the attribute
ThreadCount
of beanjava.lang:type=Threading
to metricsmy.own.jvm.thread.count
. It works well. I can see the metricsmy_own_jvm_thread_count
when calling the prometheus endpoint of my OTEL collector as below. But I can not see the metricsmy_own_jms_queue_depth
generated for the queue depth of my liberty queue beans.Steps to reproduce
JAVA_TOOL_OPTIONS
to be-Dotel.jmx.config=/config/manage/oteljmx/config.yaml -javaagent:path/to/opentelemetry-javaagent.jar
.In addition, for the other configuration rule defined for mapping the attribute
ThreadCount
of beanjava.lang:type=Threading
to metricsmy.own.jvm.thread.count
. It works well. I can see the metricsmy_own_jvm_thread_count
when calling the prometheus endpoint of my OTEL collector as below. But I can not see the metricsmy_own_jms_queue_depth
generated for the queue depth of my liberty queue beans.Expected behavior
After applying the jmx-metrics java agent,
my_own_jms_queue_depth
generated for the queue depth of my liberty queue beans.Actual behavior
After applying the jmx-metrics java agent,
my_own_jms_queue_depth
generated for the queue depth of my liberty queue beans.Javaagent or library instrumentation version
1.30.0
Environment
JDK: WebSphere Application Server 23.0.0.8/wlp-1.0.80.cl230820230807-0401) on Eclipse OpenJ9 VM openj9-0.40.0, version 11.0.20+8 (en_US) OS: Linux 4.18.0-372.73.1.el8_6.x86_64
Additional context
No response