Open pepperkick opened 1 year ago
Usually when openjdk segmentation faults it generates a hs_err_pid<pid>.log
file with details of the crash in the current directory of the path set with -XX:ErrorFile
. See if you can find this crash log. For running application with security manager see https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/advanced-configuration-options.md#running-application-with-security-manager it may help agains the AccessControlException
s.
Usually when openjdk segmentation faults it generates a
hs_err_pid<pid>.log
file with details of the crash in the current directory of the path set with-XX:ErrorFile
. See if you can find this crash log. For running application with security manager see https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/advanced-configuration-options.md#running-application-with-security-manager it may help agains theAccessControlException
s.
Appreciate the quick reply. I tried setting the variable OTEL_JAVAAGENT_EXPERIMENTAL_SECURITY_MANAGER_SUPPORT_ENABLED
but still faced the Segmentation Fault
error. Maybe this is missing in v1.23 so will check later with newer version.
I am not sure but I do not see any hs_err log files. I tried setting -XX:ErrorFile=/tmp/error.log
option and made sure it is propagated in pod logs but no error.log
file is generated so not sure if the error is getting captured or not.
Will try out more things meanwhile and update if I find more info. Any other input is also appreciated.
You are correct, that flag is missing in 1.23.0
.
Based on the current information there isn't anything we can do. Usually when jvm crashes it is a bug in the jvm, you may wish to try a different jvm version (e.g. jdk 11) and see if it works better. In case of a jvm crash common advice is to disable the JIT compiler with -Xint
and see if this changes the situation. You could also enable core dumps and see if that reveals clues about the nature of the crash. You could also add more debug logging to your application and try to determine whether it crashes in some specific place, perhaps it is possible to create a small program that reproduces the issue that way.
Thanks for the additional suggestions.
Trying different JDK version unfortunately is not an option currently since the application was created with java 8 and was never upgraded or tested with new java version.
I will try disabling JIT and see what happens. Also as per suggestion I will try to build a simple application which reproduces the issue. I have started going through the code base for this.
I did try dumping core but could not find anything meaningful pointing to the JVM crash, that said I am not really an expert on Java related debugging stuff.
@pepperkick Do you know if the proprietary dep.jar
is doing anything with native code, like JNI? Have you also opened an issue with the team that creates dep.jar
to see if it is something they can look into?
@pepperkick Do you know if the proprietary
dep.jar
is doing anything with native code, like JNI? Have you also opened an issue with the team that createsdep.jar
to see if it is something they can look into?
As far as I am aware it does not. Will check with the team when I can. All are busy with release cycle so unable to get much info.
Also I want to understand why adding the appd agent fixes the Segmentation Fault? Is appd agent known to fix some issues similar to this?
Also I want to understand why adding the appd agent fixes the Segmentation Fault? Is appd agent known to fix some issues similar to this?
you'd need to ask the appd folks, but it wouldn't be surprising to me if some unexpected interaction with appd avoids the JVM crash (see heisenbug 😄)
Also I want to understand why adding the appd agent fixes the Segmentation Fault? Is appd agent known to fix some issues similar to this?
you'd need to ask the appd folks, but it wouldn't be surprising to me if some unexpected interaction with appd avoids the JVM crash (see heisenbug 😄)
Yeah, I agree :P Wanted to confirm if something is already known around this. Seems like it is an edge case. I have not managed to reproduce the issue separately yet. It will take some time I think.
You could try with a fastdebug build from https://downloads.corretto.aws/#/downloads?build=nightly&version=8 Fastdebug builds contain extra assertions that are removed from release builds, perhaps this will provide a clue about why the crash happens.
Describe the bug
Hi Community,
I work in a team whose goal is to enable observability for all the company products which are mostly written in java.
We have an application which runs in Kuberentes as a deployment. The application is deployed using Tomcat server. We injected OpenTelemetry agent via annotation to the application deployment manifest. The agent is successfully injected with the application and we are able to see the basic traces and logs.
During a task it downloads certain jars during runtime and loads them. One of those downloaded jar (proprietary, lets call it
dep.jar
) has functionality to load security dynamically and then execute a thread pool. The security feature indep.jar
can be enabled or disabled via env variable.Now, These are the observations
If otel agent is injected and
dep.jar
has security feature is disabled then everything works fine.If otel agent is injected and
dep.jar
has security feature is enabled then the application crashes with the following errorNo stack trace or any error log printed prior to this (except few logs from
dep.jar
which helped pinpoint to security feature)If otel and appd agents both are injected and
dep.jar
has security feature is enabled then the application prints the following error but does not crashWe want to understand why injecting appd agent along with otel agent fixes the
Segmentation fault
issue. We also suppressed certain libraries (listed below) which fixes theAccessControlException
issue but only when appd agent is injected. If appd agent is not injected and only otel agent then it fails withSegmentation fault
issue regardless of suppression.Following libraries were suppressed
Steps to reproduce
Currently not available since proprietary application is used, will try to figure out a easier way to reproduce this
Expected behavior
Expected behavior is that the application will work when only otel agent is injected and libraries are suppressed without appd agent injected.
Actual behavior
Actual behavior is that application crashes with
Segmentation fault
when otel agent is injected and without appd agent injected.Javaagent or library instrumentation version
1.23.0
Environment
JDK: Picked up JAVA_TOOL_OPTIONS: -javaagent:/otel-auto-instrumentation/javaagent.jar [otel.javaagent 2023-08-22 13:49:36:433 +0000] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 1.29.0 22-Aug-2023 13:49:44.273 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name: Apache Tomcat/8.5.81 22-Aug-2023 13:49:44.282 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built: Jun 8 2022 21:30:15 UTC 22-Aug-2023 13:49:44.283 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 8.5.81.0 22-Aug-2023 13:49:44.283 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name: Linux 22-Aug-2023 13:49:44.283 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version: 4.18.0-425.10.1.el8_7.x86_64 22-Aug-2023 13:49:44.283 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture: amd64 22-Aug-2023 13:49:44.283 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home: /usr/lib/jvm/java-1.8.0-amazon-corretto/jre 22-Aug-2023 13:49:44.283 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version: 1.8.0_382-b05 22-Aug-2023 13:49:44.284 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor: Amazon.com Inc. 22-Aug-2023 13:49:44.284 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE: /opt/apache-tomcat-8.5.81 22-Aug-2023 13:49:44.284 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME: /opt/apache-tomcat-8.5.81 22-Aug-2023 13:49:44.284 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -javaagent:/otel-auto-instrumentation/javaagent.jar 22-Aug-2023 13:49:44.284 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=/opt/webserver/conf/logging.properties 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:+UseStringDeduplication 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:MaxRAMPercentage=70.0 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:MaxMetaspaceSize=4096m 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:+UseParallelGC 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:ParallelGCThreads=1 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:MaxGCPauseMillis=400 22-Aug-2023 13:49:44.285 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Xms1g 22-Aug-2023 13:49:44.286 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:ReservedCodeCacheSize=512m 22-Aug-2023 13:49:44.286 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:+UseCodeCacheFlushing 22-Aug-2023 13:49:44.286 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048 22-Aug-2023 13:49:44.286 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources 22-Aug-2023 13:49:44.286 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 22-Aug-2023 13:49:44.286 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.security.egd=file:/dev/./urandom 22-Aug-2023 13:49:44.286 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dignore.endorsed.dirs= 22-Aug-2023 13:49:44.287 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=/opt/webserver 22-Aug-2023 13:49:44.287 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=/opt/webserver 22-Aug-2023 13:49:44.287 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=/opt/webserver/temp
Java Version: openjdk version "1.8.0_382" OpenJDK Runtime Environment Corretto-8.382.05.1 (build 1.8.0_382-b05) OpenJDK 64-Bit Server VM Corretto-8.382.05.1 (build 25.382-b05, mixed mode)
OS: RHEL 8 amazoncorretto:8
Additional context
No response