Open alelsg opened 8 years ago
Seems we have the same issue in Jenkins: JENKINS-39388. No symbols from the requester to say for sure, but the pattern is very similar: https://issues.jenkins-ci.org/secure/attachment/34653/hs_err.txt
The auto-unpack/load feature of JNA was intended to make self-contained distributions easier. In situations where you have an installation that is in regular use (like this one), it's generally better and more efficient to install the shared library in a known location and add it to the system library load path.
Hi All,
We have implemented a war file to provide WEB API for customers. This is a middleware application between our main c++ library that processes queries (further libonetick.dll) and client applications. .war file gets loaded by tomcat. Once we move this war to tomcat's webapps directory our customers are able to query our libeontick.dll using curl, browser etc.
Our java API classes and functions called from this war file source code are generated by swig(war redirects calls from java to c++ using jni).
You can imagine the sequence of library loads as follows
First we load jomd.dll (which contains swig generated c++ source code) using System.loadLibrary("jomd"); call in a static block of jomd_20160320121237JNI class, then jomd.dll loads libonetick.dll as a dependency.
So to load correct libonetick.dll we should have correct PATH and LD_LIBRARY_PATH set before initializing libonetick objects in java. We have to provide a mechanism to upgrade war file without restarting tomcat.
War file has configuration file that points to the new distribution bin directory. Each time we redeploy this war we change this config file to point to the new distribution's bin directory. Directory from where jomd.dll and new libonetick.dll should be loaded.
Now to load libonetick.dll from correct location during upgrade we should call c++ native methods programmatically to update PATH and LD_LIBRARY_PATH env vars each time we redeploy webapi.
We use libc library on linux and msvcrt on Windows to expose c++ interface in java and set new environment before loading jomd.dll and libonetick.dll.
Here is the code we use to get c++ API functions and call them to set PATH and LD_LIBRARY_PATH.
interface CLibraryInstance extends Library {
}
All crashes we observed occur during dlsym calls , here is the stack
Thread 1 (process 3054):
0 0x0000003335630155 in raise () from /lib64/libc.so.6
1 0x0000003335631bf0 in abort () from /lib64/libc.so.6
2 0x00002b09f919aac5 in os::abort ()
from /vol2/omdshare/Linux_RHEL5_x86_64/tools/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so
3 0x00002b09f92fa137 in VMError::report_and_die ()
from /vol2/omdshare/Linux_RHEL5_x86_64/tools/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so
4 0x00002b09f919e5e0 in JVM_handle_linux_signal ()
from /vol2/omdshare/Linux_RHEL5_x86_64/tools/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so
5
6 0x0000003334608db5 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2
7 0x0000003334609252 in _dl_lookup_symbol_x ()
from /lib64/ld-linux-x86-64.so.2
8 0x0000003335706734 in do_sym () from /lib64/libc.so.6
9 0x0000003335e01104 in dlsym_doit () from /lib64/libdl.so.2
10 0x000000333460ce56 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
11 0x0000003335e0150d in _dlerror_run () from /lib64/libdl.so.2
12 0x0000003335e010ba in dlsym () from /lib64/libdl.so.2
13 0x00002b09f9196f6d in os::dll_lookup ()
from /vol2/omdshare/Linux_RHEL5_x86_64/tools/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so
14 0x00002aaaaacf4a6e in Java_java_lang_ClassLoader_00024NativeLibrary_find
from /vol2/omdshare/Linux_RHEL5_x86_64/tools/jdk1.7.0_25/jre/lib/amd64/libjava.so
15 0x00002aaaab378d8e in ?? ()
16 0x0000000000000000 in ?? ()
(gdb)
Java frames look like the following
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j java.lang.ClassLoader$NativeLibrary.find(Ljava/lang/String;)J+0 j java.lang.ClassLoader.findNative(Ljava/lang/ClassLoader;Ljava/lang/String;)J+49 v ~StubRoutines::call_stub j com.sun.jna.Native.close(J)V+0 j com.sun.jna.NativeLibrary.dispose()V+85 j com.sun.jna.NativeLibrary.finalize()V+1 v ~StubRoutines::call_stub J java.lang.ref.Finalizer.invokeFinalizeMethod(Ljava/lang/Object;)V J java.lang.ref.Finalizer.access$100(Ljava/lang/ref/Finalizer;)V J java.lang.ref.Finalizer$FinalizerThread.run()V v ~StubRoutines::call_stub
I compiled dlsym and dlopen functions wrappers into a separate dll which did some extra logging.
From dlsym wrapper logs I found the problematic call, which was 0x40a9db4000000000 Call 479 to Java_com_sun_jna_Native_close function with handle=0x2aaab8cc4f50 on handle 0x2aaab8cc4f50
code tries to find Java_com_sun_jna_Native_close function on a handle 0x2aaab8cc4f50 which was invalidated
From dlopen wrapper dll logs I found problematic dll name (for which 0x2aaab8cc4f50 handle belongs) ... 0x40194c4100000000 Call 51: /home/build/aleksg/dev/testruns/20160618231559/webapi_test.small/apache-tomcat-7.0.52/temp/jna-94094958/jna7683664748922841283.tmp=0x2aaab8cc4f50 ...
Eliminating the code little by little I found that this tmp dll gets loaded to tomcat after above mentioned call of Native.loadLibrary(in the source code I sent you).
In successful runs, if Java_com_sun_jna_Native_close is called, it gets called before JNI_OnUnload function gets called for tmp dll. Here is a successful run debug info example (only last few lines)
Call:483 to Java_com_sun_jna_Native_close function Call:484 to JNI_OnUnload function Call:485 to Java_com_omd_jomd_jomd_120160519120533JNI_delete_1StreamingCallbackWrapperBase function Call:486 to Java_com_omd_jomd_jomd_120160519120533JNI_delete_1NameValueMap function
For crashed run the sequence of Jni_OnUnload and Java_com_sun_jna_Native_close was vice verca. In opposite to this in all crashed cases Java_com_sun_jna_Native_closewas got called after JNIOnUnload was called for tmp dll.
Call:486 to JNI_OnUnload function Call:487 to Java_com_sun_jna_Native_close function Finished Call:487 to Java_com_sun_jna_Native_close function РЇ;ёЄ*: undefined symbol: Java_com_sun_jna_Native_close handle was invalid Call:488 to Java_com_sun_jna_Native_close__J function
To summarize
1.The problem is sporadic
2.It occurs when garbage collector thread tries to close handle of tmp dll created in tomcat temp folder after Native.loadLibrary((Platform.isWindows() ? "msvcrt" : "c"), call, when JNI_OnUnload was already been called for this tmp lib. 3.We observed crash only on Linux operating system not Windows.
JDK vesrion jdk1.7.0_25 Operating System CentOS release 5.2 (Final) Tomcat version apache-tomcat-7.0.52 Kernel Vesrion 2.6.18-92.el5xen