tada / pljava

PL/Java is a free add-on module that brings Java™ Stored Procedures, Triggers, Functions, Aggregates, Operators, Types, etc., to the PostgreSQL™ backend.
http://tada.github.io/pljava/
Other
242 stars 77 forks source link

PL/Java crash on CentOS latest kernel #128

Closed akshunj closed 5 years ago

akshunj commented 7 years ago

hs_err_pid2820.zip Hi,

After pulling the latest CentOS kernel through yum (2.6.32-696.3.2.el6.x86_64) pljava crashes on any attempt to use it. I am able to deploy various versions of pljava including the latest 1.6.0-snapshot without any problem. I tried rebuilding against the latest updates to postgres and java, but the issue persists. I am wondering if anyone else has observed this behavior? If I roll back to the previous kernel the issue goes away. I attached the output from the crash.

Thanks.

jcflack commented 7 years ago

On 06/23/2017 01:57 PM, akshunj wrote:

After pulling the latest CentOS kernel through yum (2.6.32-696.3.2.el6.x86_64) pljava crashes on any attempt to use it. I am able to deploy various versions of pljava including the latest 1.6.0-snapshot without any problem. I tried rebuilding against the latest updates to postgres and java, but the issue persists. I am wondering if anyone else has observed this behavior? If I roll back to the previous kernel the issue goes away.

Hi,

That's the first I've heard of it. I take it this is an oldish CentOS release, to be using the 2.6.32 kernel?

Usually when Java crashes, somewhere in the crash message it will give you the name and path of a file, something like hs_err_pid.log that has more complete information on what happened.

Would you be willing to attach that file (after first skimming through it to be sure nothing sensitive from your operation shows up in the data it includes)?

It sure sounds like something went sideways in preparing the -696.3.2 kernel changes, but the hs_err might give more info on exactly what.

Or, have you checked whether yum also has an update to your Java, that might work with whatever was changed in the kernel?

-Chap

akshunj commented 7 years ago

Hi Chap,

The hs_err is attached to the original post. I am using the latest JDK from Oracle. (not OpenJDK)

jcflack commented 7 years ago

On 06/23/2017 02:21 PM, akshunj wrote:

The hs_err is attached to the original post.

My apologies. I replied to an email notification of the post, which didn't include the attachment.

-Chap

akshunj commented 7 years ago

Oh, my bad did not realize.

jcflack commented 7 years ago

It certainly seems as if the new kernel changed something, but at the moment I've no clear idea what. I might try a few changes of things that can be changed, just to see if there is any way of not triggering whatever happens in this new kernel. Probably these will not change anything, but I might try just to see.

How about starting PL/Java in a fresh session after doing an explicit

SET pljava.vmoptions TO '';

Looking at the VM options that are set, I get the impression they may have been set from some time ago and earlier Java or PL/Java versions. I would be interested to see if starting over fresh with empty pljava.vmoptions will make any difference.

I am also curious about all of the Java-related directories added both to PATH and to LD_LIBRARY_PATH. Are those there because something else in your environment needs them? PL/Java doesn't. I wonder what would happen in a new session with a plain vanilla PATH and no LD_LIBRARY_PATH. Of course that is more disruptive than just SET pljava.vmoptions in a new session; the usual way would be to stop postgres and restart it after setting a vanilla PATH and unsetting LD_LIBRARY_PATH.

Come to think of it, there might be a nondisruptive way. Start a new session, use something like PL/Perl to unset PATH and LD_LIBRARY_PATH in that backend's environment, then call a PL/Java function.

Again, not expecting either attempt to work a miracle, but might gather some information.

By the way, what is returned by

\sf sqlj.java_call_handler

?

-Chap

... the actual signal being raised:

si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007ffd145b3f80

has caught an attempt to access memory in an unmapped region just south of the stack:

7ffd144bd000-7ffd144c0000 ---p 00000000 00:00 0 
7ffd145c0000-7ffd145bd000 rw-p 00000000 00:00 0                          [stack]
7ffd145df000-7ffd145e0000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Vendors have been recently hardening kernels against possible attacks that involve accesses near the gap between stack and other stuff, so the new kernel may well have rearranged some furniture in that area. The surprise is that Java would be doing something that the new arrangement would trip up. Have you checked for a recent newer build of Oracle JDK?

jcflack commented 7 years ago

Found this: https://access.redhat.com/solutions/3091371

akshunj commented 7 years ago

Chap, thanks I'll have to check this out!

On Jun 24, 2017 10:13 AM, "Chapman Flack" notifications@github.com wrote:

Found this: https://access.redhat.com/solutions/3091371

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tada/pljava/issues/128#issuecomment-310840906, or mute the thread https://github.com/notifications/unsubscribe-auth/AcSizyO20QCbvGhVBKrgC2kSF2fOoRbaks5sHRlsgaJpZM4OD3h9 .

jcflack commented 7 years ago

I can't access the Red Hat solutions link, but other reports of the issue online suggest adding -Xss2M (or larger) to the VM options to make the per-thread stack at least 2 MB in size. What the new kernel does apparently is to increase the size of the "Stack Guard" region below the stack in a way that Java blunders into if the initial stack size isn't big enough.

According to the docs, this option is a hard stack size setting, not a minimum. Not only will the stack begin that size, it also can't grow. So if whatever PL/Java is being used for might require more than 2 MB of stack, the option may need to be increased further to avoid stack overflow errors.

I assume this is an interim solution, and Oracle will eventually release a Java update that doesn't blunder into the stack guard, and then -Xss won't have to be explicitly set.

In your specific case, I would still be interested in the output from

\sf sqlj.java_call_handler

and in trying to simplify your pljava.vmoptions settings ... maybe starting with a simple

SET pljava.vmoptions TO '-Xss2M';

seeing if that works, then maybe adding back other tuning options as you need them, referring to the PL/Java VM options page for ideas. Turning on class data sharing is likely a win.

akshunj commented 7 years ago

Thanks Chap, the link to the RHEL article you posted does indeed fix the problem. I guess we'll have to wait for a permanent fix in the JDK.

jcflack commented 7 years ago

Thanks for the confirmation. If you were able to see the whole RHEL article, did it have any other suggestions, or just the -Xss2M option I saw in other posts I was able to read?

What does your

\sf sqlj.java_call_handler

say, by the way?

akshunj commented 7 years ago

Yes I'll paste the article in a follow up reply. I didn't try \sf but I can try later.

On Jun 24, 2017 12:22 PM, "Chapman Flack" notifications@github.com wrote:

Thanks for the confirmation. If you were able to see the whole RHEL article, did it have any other suggestions, or just the -Xss2M option I saw in other posts I was able to read?

What does your

\sf sqlj.java_call_handler

say, by the way?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tada/pljava/issues/128#issuecomment-310848498, or mute the thread https://github.com/notifications/unsubscribe-auth/AcSiz5U59Q6-mQlBJT2fLlfuoWc4Jb4Mks5sHTfNgaJpZM4OD3h9 .

akshunj commented 7 years ago

I'm not sure how this cut and paste will look on the mailing list, so my apologies if it's rubbish:

JVM crashes after updating to kernel with patch for Stack Guard flaw. SOLUTION UNVERIFIED - Updated Yesterday at 11:18 AM - English https://access.redhat.com/solutions/3091371 Environment

Issue

Raw https://access.redhat.com/solutions/3091371#

A fatal error has been detected by the Java Runtime Environment:

#

SIGBUS (0x7) at pc=0x00007f0d190f6ec3, pid=17221, tid=0x00007f0d2be12740

#

Problematic frame:

j java.lang.Object.()V+0

Resolution

The current workaround is to increase the Thread stack size of the JVM using -Xss2m. This will require you to restart the JVM.

Research is being performed on a permanent solution.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

On Jun 24, 2017 12:25 PM, "Rick Jackson" rickjackson001@gmail.com wrote:

Yes I'll paste the article in a follow up reply. I didn't try \sf but I can try later.

On Jun 24, 2017 12:22 PM, "Chapman Flack" notifications@github.com wrote:

Thanks for the confirmation. If you were able to see the whole RHEL article, did it have any other suggestions, or just the -Xss2M option I saw in other posts I was able to read?

What does your

\sf sqlj.java_call_handler

say, by the way?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tada/pljava/issues/128#issuecomment-310848498, or mute the thread https://github.com/notifications/unsubscribe-auth/AcSiz5U59Q6-mQlBJT2fLlfuoWc4Jb4Mks5sHTfNgaJpZM4OD3h9 .

akshunj commented 7 years ago

Hi Chap,

I ran \sf sqlj.java_call_handler and get the following output:

myPgDb01=# \sf sqlj.java_call_handler CREATE OR REPLACE FUNCTION sqlj.java_call_handler() RETURNS language_handler LANGUAGE c AS 'pljava', $function$java_call_handler$function$

jcflack commented 7 years ago

Hi,

I had a sneaking suspicion. It appears that, while you reported building/installing several different PL/Java versions, the configuration you've got inside PostgreSQL is somehow partially updated and doesn't reflect that. Starting with 1.5.0, the dynamic library has been named with a version, so the output would look something like this in the expected case:

CREATE OR REPLACE FUNCTION sqlj.java_call_handler()
 RETURNS language_handler
 LANGUAGE c
AS 'libpljava-so-1.5.1-BETA1', $function$java_call_handler$function$

Have you been using CREATE EXTENSION, or an older, pre-9.1 installation approach? What does

\dx pljava

say? For that matter, what does

SELECT * FROM pg_available_extension_versions WHERE name = 'pljava';

say?

akshunj commented 7 years ago

I think at the moment this particular DB is using an older version ala deploy.jar method. I used the create extension method earlier:

MyPgDb01=# \dx pljava List of installed extensions Name | Version | Schema | Description ------+---------+--------+------------- (0 rows)

MyPgDb01=# SELECT * FROM pg_available_extension_versions WHERE name = 'pljava'; name | version | installed | superuser | relocatable | schema | requires | comment --------+----------------+-----------+-----------+-------------+--------+----------+-------------------------------------------------------------- pljava | 1.6.0-SNAPSHOT | f | t | f | sqlj | | PL/Java procedural language (https://tada.github.io/pljava/) (1 row)

On Sun, Jun 25, 2017 at 8:49 PM, Chapman Flack notifications@github.com wrote:

Hi,

I had a sneaking suspicion. It appears that, while you reported building/installing several different PL/Java versions, the configuration you've got inside PostgreSQL is somehow partially updated and doesn't reflect that. Starting with 1.5.0, the dynamic library has been named with a version, so the output would look something like this in the expected case:

CREATE OR REPLACE FUNCTION sqlj.java_call_handler() RETURNS language_handler LANGUAGE c AS 'libpljava-so-1.5.1-BETA1', $function$java_call_handler$function$

Have you been using CREATE EXTENSION, or an older, pre-9.1 installation approach? What does

\dx pljava

say? For that matter, what does

SELECT * FROM pg_available_extension_versions WHERE name = 'pljava';

say?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tada/pljava/issues/128#issuecomment-310939964, or mute the thread https://github.com/notifications/unsubscribe-auth/AcSizzlHnNs1YacxzhevRpTcQs-CHqZpks5sHwAUgaJpZM4OD3h9 .

jcflack commented 7 years ago

Got it. So, I'd suggest going to a nice extension installation of either 1.5.0 (if you like a stable, final release) or 1.5.1-BETA1 (if you like beta testing). You should be able to just build your choice of those, run the self-installer jar, and see it show up in pg_available_extension_versions, and then (making sure you're in a fresh session where no PL/Java code has run yet), run

CREATE EXTENSION pljava VERSION '1.5.0' FROM unpackaged;

(or '1.5.1-BETA1' if you prefer), and it should preserve all your existing PL/Java stuff and bring it all to a consistent, extension-packaged, released version. You should be able to see with \dx and \sf that it happened.

After that's done, it should be possible to look at pruning those Java-related entries in PATH and LD_LIBRARY_PATH that I suspect are vestiges of your old Deployer installation; current PL/Java works without them. (But maybe they are there for something else you're using.)

-Chap

akshunj commented 7 years ago

Spot on, used to need that library path to do the old make install back in the day. It's survived hundreds of vm template builds :)

On Jun 25, 2017 9:44 PM, "Chapman Flack" notifications@github.com wrote:

Got it. So, I'd suggest going to a nice extension installation of either 1.5.0 (if you like a stable, final release) or 1.5.1-BETA1 (if you like beta testing). You should be able to just build your choice of those, run the self-installer jar, and see it show up in pg_availableextension versions, and then (making sure you're in a fresh session where no PL/Java code has run yet), run

CREATE EXTENSION pljava VERSION '1.5.0' FROM unpackaged;

(or '1.5.1-BETA1' if you prefer), and it should preserve all your existing PL/Java stuff and bring it all to a consistent, extension-packaged, released version. You should be able to see with \dx and \sf that it happened.

After that's done, it should be possible to look at pruning those Java-related entries in PATH and LD_LIBRARY_PATH that I suspect are vestiges of your old Deployer installation; current PL/Java works without them. (But maybe they are there for something else you're using.)

-Chap

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tada/pljava/issues/128#issuecomment-310944348, or mute the thread https://github.com/notifications/unsubscribe-auth/AcSiz0LqzNKbXs901FBDhOx7o5_OBV1Oks5sHwz-gaJpZM4OD3h9 .

jcflack commented 5 years ago

Java builds that do not require the -Xss hack have been out for many months now. Closing.