tada / pljava

PL/Java is a free add-on module that brings Java™ Stored Procedures, Triggers, Functions, Aggregates, Operators, Types, etc., to the PostgreSQL™ backend.
http://tada.github.io/pljava/
Other
238 stars 77 forks source link

JVM crash with java.nio.charset.UnsupportedCharsetException: X-PGSQL_ASCII on pljava init. #416

Closed yazun closed 1 year ago

yazun commented 1 year ago

Hello @jcflack!

We are trying to deploy on Java17 this time using the standard LOAD mechanism (till now we used extracted DDL for schema init). As before (described in https://github.com/tada/pljava/issues/347), we are using SQL_ASCII encoded DB and on the initialisation via LOAD we a gifted with VM crash, does it ring a bell?

client_encoding                           | SQL_ASCII
server_encoding                           | SQL_ASCII

From https://tada.github.io/pljava/use/charsets.html we assumed this should not be a problem (and had not been before java17)?

WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f1f63c8439e, pid=27113, tid=27113
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.5+8 (17.0.5+8) (build 17.0.5+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (17.0.5+8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x8b939e]  jni_CallStaticObjectMethodV+0xae
#
# Core dump will be written. Default location: /DATA_PRIMARY/cores/%E_27113_%t.core
#
# An error report file with more information is saved as:
# /ZSSD/xz24/app/coord/hs_err_pid27113.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
java.nio.charset.UnsupportedCharsetException: X-PGSQL_ASCII
    at java.base/java.nio.charset.Charset.forName(Charset.java:528)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f1f63c8439e, pid=28005, tid=28005
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.5+8 (17.0.5+8) (build 17.0.5+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (17.0.5+8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x8b939e]  jni_CallStaticObjectMethodV+0xae
#
# Core dump will be written. Default location: /DATA_PRIMARY/cores/%E_28005_%t.core
#
# An error report file with more information is saved as:
# /ZSSD/xz24/app/coord/hs_err_pid28005.log
#
# If you would like to submit a bug report, please visit:

Also attaching the hs log. hs_err_pid34733.log

Thanks!

yazun commented 1 year ago

When changed the LANG to C, the crash is still there, but without a clear indicator this time. The core dumped still shows a String.c as a culprit though:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: xdba surveys 192.168.168.145(9706) CREATE FUNCTION        '.
Program terminated with signal 6, Aborted.
#0  0x00007fda87671387 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-54.el7_9.x86_64 libcom_err-1.42.9-17.el7.x86_64 libselinux-2.5-15.el7.x86_64 libxml2-2.9.1-6.el7.4.x86_64 nspr-4.34.0-3.1.el7_9.x86_64 nss-3.79.0-4.el7_9.x86_64 nss-softokn-freebl-3.79.0-4.el7_9.x86_64 nss-util-3.79.0-1.el7_9.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 temurin-17-jdk-17.0.5.0.0.8-1.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007fda87671387 in raise () from /lib64/libc.so.6
#1  0x00007fda87672a78 in abort () from /lib64/libc.so.6
#2  0x00007fd6ea071489 in os::abort(bool, void*, void const*) [clone .cold] () from /usr/lib/jvm/temurin-17-jdk/lib/server/libjvm.so
#3  0x00007fd6eacf78fc in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) () from /usr/lib/jvm/temurin-17-jdk/lib/server/libjvm.so
#4  0x00007fd6eacf831b in VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...) () from /usr/lib/jvm/temurin-17-jdk/lib/server/libjvm.so
#5  0x00007fd6eacf834e in VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*) () from /usr/lib/jvm/temurin-17-jdk/lib/server/libjvm.so
#6  0x00007fd6eaba1153 in JVM_handle_linux_signal () from /usr/lib/jvm/temurin-17-jdk/lib/server/libjvm.so
#7  <signal handler called>
#8  0x00007fd6ea6d739e in jni_CallStaticObjectMethodV () from /usr/lib/jvm/temurin-17-jdk/lib/server/libjvm.so
#9  0x00007fd6eb175d7f in JNI_callStaticObjectMethodLockedV (clazz=<optimized out>, methodID=<optimized out>, args=args@entry=0x7ffcf1e25af0) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/JNICalls.c:797
#10 0x00007fd6eb175e2c in JNI_callStaticObjectMethodLocked (clazz=<optimized out>, methodID=<optimized out>) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/JNICalls.c:788
#11 0x00007fd6eb1769f5 in String_appendJavaString (javaString=<optimized out>, buf=0x7ffcf1e25c00) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/type/String.c:293
#12 String_appendJavaString (buf=0x7ffcf1e25c00, javaString=<optimized out>) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/type/String.c:287
#13 0x00007fd6eb176ceb in elogExceptionMessage (env=env@entry=0x16fe518, exh=0x192c5f8, logLevel=logLevel@entry=20) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/JNICalls.c:159
#14 0x00007fd6eb173e11 in endCallMonitorHeld (env=env@entry=0x16fe518) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/JNICalls.c:286
#15 0x00007fd6eb175d8a in JNI_callStaticObjectMethodLockedV (clazz=<optimized out>, methodID=<optimized out>, args=args@entry=0x7ffcf1e25c80) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/JNICalls.c:798
#16 0x00007fd6eb175e2c in JNI_callStaticObjectMethodLocked (clazz=clazz@entry=0x192c5c8, methodID=methodID@entry=0x19ceac8) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/JNICalls.c:788
#17 0x00007fd6eb17bc58 in String_initialize_codec () at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/type/String.c:436
#18 String_initialize () at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/type/String.c:382
#19 0x00007fd6eb17d667 in Type_initialize () at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/type/Type.c:1015
#20 0x00007fd6eb1865ad in initPLJavaClasses () at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/Backend.c:1088
#21 initsequencer (is=<optimized out>, tolerant=1 '\001') at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/Backend.c:717
#22 0x0000000000b54aa3 in internal_load_library.lto_priv.0 () at utils/fmgr/dfmgr.c:281
#23 0x0000000000b570f3 in load_external_function () at utils/fmgr/dfmgr.c:105
#24 0x0000000000624759 in fmgr_c_validator (fcinfo=<optimized out>) at catalog/pg_proc.c:825
#25 0x0000000000b57900 in OidFunctionCall1Coll (functionId=<optimized out>, collation=0, arg1=59601) at utils/fmgr/fmgr.c:1433
#26 0x0000000000624281 in ProcedureCreate () at catalog/pg_proc.c:728
#27 0x00000000006c97dd in CreateFunction () at utils/init/miscinit.c:307
#28 0x0000000000c1db4b in ProcessUtilitySlow.constprop.0 (pstate=0x14f4b88, pstmt=0x15a29d0, queryString=0x15a19e8 "CREATE OR REPLACE FUNCTION sqlj.java_call_handler() RETURNS pg_catalog.language_handler AS e'libpljava-so-1.6-SNAPSHOT' LANGUAGE C", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, sentToRemote=0 '\000', completionTag=0x7ffcf1e28bd0 "",
    dest=<optimized out>) at tcop/utility.c:4726
#29 0x00000000009c3a2f in standard_ProcessUtility () at tcop/utility.c:3555
#30 0x00000000009c71f1 in ProcessUtility (completionTag=0x7ffcf1e28bd0 "", sentToRemote=0 '\000', dest=0x15a2c90, queryEnv=<optimized out>, params=<optimized out>, context=<optimized out>, queryString=<optimized out>, pstmt=0x15a29d0) at tcop/pquery.c:3046
#31 PortalRunUtility () at tcop/pquery.c:1999
#32 0x00000000009c7363 in PortalRunMulti (portal=portal@entry=0x159c788, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x15a2c90, altdest=altdest@entry=0x15a2c90, completionTag=completionTag@entry=0x7ffcf1e28bd0 "") at tcop/pquery.c:2191
#33 0x00000000009c7f03 in PortalRun (portal=0x159c788, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x15a2c90, altdest=0x15a2c90, completionTag=0x7ffcf1e28bd0 "") at tcop/pquery.c:1388
#34 0x00000000009b967b in exec_simple_query.lto_priv.0 () at tcop/postgres.c:1608
#35 0x00000000009ca302 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:5565
#36 0x0000000000907baf in BackendRun (port=0x14f4f60) at postmaster/postmaster.c:5032
#37 BackendStartup (port=0x14f4f60) at postmaster/postmaster.c:4704
#38 ServerLoop () at postmaster/postmaster.c:1961
#39 0x00000000009089c9 in PostmasterMain () at postmaster/postmaster.c:1569
#40 0x0000000000500816 in main (argc=5, argv=0x14c9930) at main/main.c:233

Attaching the new hs log as well hs_err_pid9320.log

yazun commented 1 year ago

And we see codec is still a problem: https://github.com/tada/pljava/blob/master/pljava-so/src/main/c/type/String.c#L382 specifically https://github.com/tada/pljava/blob/master/pljava-so/src/main/c/type/String.c#L293

yazun commented 1 year ago

I guess we might need a non-optimized version to see more:

Breakpoint 1, String_appendJavaString (buf=0x7ffef5f1dc40, javaString=0x16acb08) at /ZSSD/xz24/source/pljava16stable.git/pljava-so/src/main/c/type/String.c:287
287     void String_appendJavaString(StringInfoData* buf, jstring javaString)
(gdb) p buf
$2 = (StringInfoData *) 0x7ffef5f1dc40
(gdb) p *buf
$3 = {data = 0x1436c98 "", len = 0, maxlen = 1024, cursor = 0}
(gdb) p javaString
$4 = (struct _jobject *) 0x16acb08
(gdb) p *javaString
$5 = <incomplete type>
(gdb) l
282             }
283
284             return result;
285     }
286
287     void String_appendJavaString(StringInfoData* buf, jstring javaString)
288     {
289             if ( 0 == javaString )
290                     return;
291             if ( ! s_two_step_conversion )
(gdb)
292             {
293                     jobject charbuf = JNI_callStaticObjectMethodLocked(s_CharBuffer_class,
294                             s_CharBuffer_wrap, javaString);
295                     appendCharBuffer(buf, charbuf);
296                     JNI_deleteLocalRef(charbuf);
297             }
298             else
299             {
300                     char* dbEnc = String_createNTS(javaString);
301                     if ( 0 == dbEnc ) /* this can happen if a JNI call fails */
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
0x00007f47d487239e in jni_CallStaticObjectMethodV () from /usr/lib/jvm/temurin-17-jdk/lib/server/libjvm.so
jcflack commented 1 year ago

Gee, I thought I would never reproduce that.

The needed clues were there in your reports all along:

WARNING: A command line option has enabled the Security Manager ... java.nio.charset.UnsupportedCharsetException: X-PGSQL_ASCII

and in those hs_err logs:

Event: 0.139 loading class java/security/AccessControlException ... Event: 0.142 loading class java/nio/charset/UnsupportedCharsetException ... jvm_args: -Xmx512M --add-modules=jdk.unsupported -Djava.security.manager= ...

It appears that, in moving to Java 17, you must have added -Djava.security.manager= in pljava.vmoptions. But that is not quite right: the thing to add (and need to add it doesn't really arrive until Java 18) is -Djava.security.manager=allow (and the difference is, =allow means that PL/Java is allowed to enable the security manager, but the bare = tells Java to enable the security manager before PL/Java even loads). And that explains the warning about the command line option that enabled the security manager.

PL/Java can't start with the security manager enabled early like that. There are things in early setup that the security manager would forbid, and then PL/Java itself enables the security manager after those things have been done.

What happened here is that, during that early setup, the security manager prevented PL/Java from creating the X-PGSQL_ASCII charset provider. That explains the UnsupportedCharsetException, and the exception was raised during the codec initialization, and the crash happened while trying to use the incompletely-initialized codec to report the exception that occurred while initializing it.

It does look like there is a bug in my order of initialization that I could fix so as to avoid the crash in that situation, and get a proper exception message instead. (The message would still only say UnsupportedCharsetException: X-PGSQL_ASCII, so it still would be puzzling, but at least it would not be a crash.)

However, the cause of the situation was the -Djava.security.manager= in the pljava.vmoptions, and the situation can be avoided either by changing it to -Djava.security.manager=allow or (for Java 17) omitting that setting entirely.

Starting with Java 18, omitting the setting is no longer an option, and it has to be -Djava.security.manager=allow.

jcflack commented 1 year ago

Believed resolved in 1.6.5.