jpype-project / jpype

JPype is cross language bridge to allow Python programs full access to Java class libraries.
http://www.jpype.org
Apache License 2.0
1.12k stars 181 forks source link

SIGSEGV on termination #842

Open ShaheedHaque opened 4 years ago

ShaheedHaque commented 4 years ago

As per #720, to get 1.0.2 working for us, I moved JPype initialisation to be delayed it until actually needed. On process exit, it seems that if the initialisation code is not actually called, the process exists with a SIGSEGV:

Fatal Python error: Segmentation fault

Thread 0x00007f84e4194740 (most recent call first):
  File "/usr/local/lib/python3.8/dist-packages/jpype/_core.py", line 321 in _JTerminate

And here is line 321:

# In order to shutdown cleanly we need the reference queue stopped
# otherwise, we can experience a crash if a Java thread is waiting
# for the GIL.
def _JTerminate():
    try:
        _jpype.shutdown()  <<<<<<<<<<<<< 321
    except RuntimeError:
        pass

The only reference to jpype that this program could have made is an import as part of its transitive fanout:

import jpype

and it won't have invoked startJVM().

Thrameos commented 4 years ago

Thanks for the report. I will look into this as soon as my IT folks get my development machine repaired. What is the value of “isJVMStarted()”? Does adding an if statement for it (in Python or in pyjp_module) fix the issue?

ShaheedHaque commented 4 years ago

First, I have noticed that the SIGSEGV is not easy to reproduce, happening as rarely as perhaps 1 time in 200.

Second, I've added a print() of “isJVMStarted()”, and not seen the failure after a handful of runs.

Will report back after gathering more data.

ShaheedHaque commented 4 years ago

OK, it failed just now even with the “isJVMStarted()” in place on my Ubuntu setup:

Fatal Python error: Segmentation fault

Thread 0x00007fbca3605740 (most recent call first):
  File "/usr/local/lib/python3.8/dist-packages/jpype/_core.py", line 322 in _JTerminate  <<< line number changed because of inserted isJVMStarted().

Again, it must have run without issue several hundred times before this point.

Interestingly, my MacOS-based colleage is regularly seeing what we think is the same issue, and he was able to extract a crash log: hs_err_pid53983.log. This repro is from the cycling of Celery workers, with the SIGSEGV at process exit (or at least we assume so, since it has no discernible effect on the operation of the system). Note: he does not have the inserted isJVMStarted().

ShaheedHaque commented 4 years ago

And here is a curious thing...I just ran myusual test script, and it seemed to exit twice, like this:

...
========== 3 failed, 362 passed, 1261 warnings in 6067.84s (1:41:07) ===========
isJVMStarted=================== True
isJVMStarted=================== False

So, _JTerminate() was called twice, and once thought the JVM started, and once not.

Thrameos commented 4 years ago

Any chance you can get this to replicate on a reduced version of the code? It sill seems like something in the JVM is crashing. So either we are creating the JVM twice as a result of a fork or a terminate and restart. The guard code is supposed to prevent starting twice, but perhaps if you can replicate a miss and fix it we can finally resolve this.

ShaheedHaque commented 4 years ago

I tried before, but there is a lot of stuff, and when I trimmed too far it stopped failing. Now that we have a slightly different problem, I'll try once again. I'll report back with any results.

mariusvniekerk commented 4 years ago

What are the ramifications of doing?

def _JTerminate():
    try:
         if _jpype.isStarted():
              _jpype.shutdown() 
    except RuntimeError:
        pass
Thrameos commented 4 years ago

It should make sure that Java closes properly when Python exits. Before we did not guarantee that Java files were properly closed nor that Java threads had terminated. If you call shutdown manually then isStarted will be false and it will just operate as normal. If Python exits without closing Java we perform the Java shutdown first. If there are non-daemon threads then it will wait for them to terminate.

ShaheedHaque commented 4 years ago

What are the ramifications of doing?

def _JTerminate():
    try:
         if _jpype.isStarted():
              _jpype.shutdown() 
    except RuntimeError:
        pass

Based on my experiment adding calls to isJVMStarted(), it is not clear to me that would make any difference, because the SEGV can occur even when the test returns False as in this example:

=========================== short test summary info ============================
FAILED test/test_suite74gb_franecki.py::TestPeoplesPension::test_100_complete_use_cases[SubmitEnrolmentsAndContributions_]
FAILED test/test_suite90_live.py::TestLiveA::test_400_check_log_files____ - A...
========== 2 failed, 363 passed, 1242 warnings in 6168.39s (1:42:48) ===========
Fatal Python error: Segmentation fault

Thread 0x00007f6260ee6740 (most recent call first):
  File "/usr/local/lib/python3.8/dist-packages/jpype/_core.py", line 322 in _JTerminate
isJVMStarted=================== False
Thrameos commented 3 years ago

Can you look over #937 to see if an option fixes this issue?

nayana-prashanth commented 3 years ago

Hi. I am using JPype 1.2.0 and have been seeing this issue for a while. The Jenkins build or local run has intermittent failures with below failure. Any suggestion to resolve this issue is appreciated:


2021-05-06 16:32:36.041  
2021-05-06 16:32:36.041  Thread 0x00007fee56e3a100 (most recent call first):
2021-05-06 16:32:36.041    File "/opt/app-root/lib64/python3.8/site-packages/jpype/_core.py", line 340 in _JTerminate
2021-05-06 16:32:36.041  #
2021-05-06 16:32:36.041  # A fatal error has been detected by the Java Runtime Environment:
2021-05-06 16:32:36.042  #
2021-05-06 16:32:36.042  #  SIGSEGV (0xb) at pc=0x00007fee55d6a9bf (sent by kill), pid=270, tid=427
2021-05-06 16:32:36.042  #
2021-05-06 16:32:36.042  # JRE version: OpenJDK Runtime Environment 18.9 (11.0.9.1+1) (build 11.0.9.1+1-LTS)
2021-05-06 16:32:36.042  # Java VM: OpenJDK 64-Bit Server VM 18.9 (11.0.9.1+1-LTS, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
2021-05-06 16:32:36.042  # Problematic frame:
2021-05-06 16:32:36.042  # C  [libpthread.so.0+0x129bf]  raise+0x10f
2021-05-06 16:32:36.042  #
2021-05-06 16:32:36.042  # Core dump will be written. Default location: /home/jenkins/workspace/ne-learning_concord-mono_IS-2243/e2e/tests/step_defs/core.270
2021-05-06 16:32:36.042  #
2021-05-06 16:32:36.042  # An error report file with more information is saved as:
2021-05-06 16:32:36.042  # /home/jenkins/workspace/ne-learning_concord-mono_IS-2243/e2e/tests/step_defs/hs_err_pid270.log
2021-05-06 16:32:36.042  #
2021-05-06 16:32:36.042  # If you would like to submit a bug report, please visit:
2021-05-06 16:32:36.042  #   https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%208&component=java-11-openjdk
2021-05-06 16:32:36.042  #
2021-05-06 16:32:36.042  Fatal Python error: Aborted
2021-05-06 16:32:36.042  
2021-05-06 16:32:36.042  Thread 0x00007fee56e3a100 (most recent call first):
2021-05-06 16:32:36.042    File "/opt/app-root/lib64/python3.8/site-packages/jpype/_core.py", line 340 in _JTerminate
2021-05-06 16:32:36.042  /home/jenkins/workspace/ne-learning_concord-mono_IS-2243@tmp/durable-0824d76d/script.sh: line 54:   270 Aborted     ```
noamaviv commented 3 years ago

Could it be related to what you wrote here in your documenation?

https://jpype.readthedocs.io/en/latest/userguide.html#errors-reported-by-python-fault-handler

noamaviv commented 3 years ago

It seems like using the -p no:faulthandler switch on pytest might help avoid these errors.