imagej / napari-imagej

Use ImageJ functionality from napari
https://napari-imagej.readthedocs.io
BSD 2-Clause "Simplified" License
25 stars 4 forks source link

Resolve segfaults in magicgui widget creation with JVM running. #7

Open gselzer opened 2 years ago

gselzer commented 2 years ago

A peculiar set of circumstances cause segmentation faults when running the plugin.

  1. The JVM must be running
  2. A Callable with a faulty signature must be passed to magicgui.

This is not specific to napari-imagej; see https://github.com/gselzer/napari-foobar/commit/993f3019298bb744e0c139afde8c5612dae9ad81for a reproducible example. It is, however, going to affect napari-imagej; if anyone has napari-imagej running, and tries to create any widget that is improperly written (likely due to incorrect param_options), napari will silently crash.

Under the hood, napari is calling VerboseTB to display the error arising from the call to magicgui. VerboseTB tries to access the locals at each frame in the error traceback, causing a segfault when looking at the _function_gui frame. The VerboseTB docs suggest it prints all variables in the stack; if this is actually what happens, then it is probably trying to call data managed by jpype. JPype seems to still be running by the time we reach the segfault.

See https://napari.zulipchat.com/#narrow/stream/309872-plugins/topic/Passing.20parameter.20options.20to.20magicgui/near/273885825

gselzer commented 2 years ago

@hinerm suggested I make a MCVE. I determined that the issue lies with the interaction between JPype and magicgui, as JPype.startJVM() is enough to cause the error. I had less luck creating a script MCVE that showed the error. I tried writing the following:

import jpype

import IPython.core.ultratb

jpype.startJVM()

System = jpype.JClass('java.lang.System')

vbtb = IPython.core.ultratb.VerboseTB()

def foo(s):
    raise ValueError('foo')

# Get an exception with a stack trace
try:
    foo(System)
except ValueError as exc:
    e = exc

print(vbtb.text(e.__class__, e, e.__traceback__))
gselzer commented 2 years ago

I have created a set of steps for reproducible failure:

git clone git@github.com:gselzer/napari-foobar.git
cd napari-foobar
git checkout npe-replication
conda env create -f environment.yml
conda activate napari-foobar
<path to conda installation>/envs/napari-foobar/bin/pip install -e .
napari

Once napari is running, selecting Plugins > napari-foobar: Example Magic Widget will cause the segfault. Expected behavior would be napari handling the error (which has nothing to do with jpype) and continuing to function.

P.S. the environment.yml file is pretty complicated, but you should only need napari, jpype1, and this module installed to reproduce.

@Thrameos do you have any ideas as to how I might go about debugging this?

gselzer commented 2 years ago

UPDATE: I took the JPype recommendation (thanks for the heads up, @elevans), and got the following:

script.py:

import napari
import numpy as np

viewer = napari.view_image(np.ones((20, 20)))

Running in gdb:

(napari-foobar) : gselzer@gselzer-OptiPlex-5090 ~/code/imagej/napari-imagej (retain-module-data*) [napari-foobar] »
gdb -ex 'handle SIGSEGV nostop noprint pass' python --args python -i script.py                                                                                                                                                                         #  0 {2022-03-08 13:16:02}
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
Signal        Stop  Print   Pass to program Description
SIGSEGV       No    No  Yes     Segmentation fault
(gdb) run
Starting program: /home/gselzer/miniconda3/envs/napari-foobar/bin/python -i script.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff4bb2700 (LWP 41403)]
[New Thread 0x7ffff43b1700 (LWP 41404)]
[New Thread 0x7fffe3bb0700 (LWP 41405)]
[New Thread 0x7fffdb3af700 (LWP 41406)]
[New Thread 0x7fffd2bae700 (LWP 41407)]
[New Thread 0x7fffca3ad700 (LWP 41408)]
[New Thread 0x7fffc1bac700 (LWP 41409)]
[New Thread 0x7fffb93ab700 (LWP 41410)]
[New Thread 0x7fffb8baa700 (LWP 41411)]
[New Thread 0x7fffa83a9700 (LWP 41412)]
[New Thread 0x7fff9fba8700 (LWP 41413)]
[New Thread 0x7fff973a7700 (LWP 41414)]
[New Thread 0x7fff8eba6700 (LWP 41415)]
[New Thread 0x7fff863a5700 (LWP 41416)]
[New Thread 0x7fff7dba4700 (LWP 41417)]
[Thread 0x7fff7dba4700 (LWP 41417) exited]
[Thread 0x7fff863a5700 (LWP 41416) exited]
[Thread 0x7fff8eba6700 (LWP 41415) exited]
[Thread 0x7fff973a7700 (LWP 41414) exited]
[Thread 0x7fff9fba8700 (LWP 41413) exited]
[Thread 0x7fffa83a9700 (LWP 41412) exited]
[Thread 0x7fffb8baa700 (LWP 41411) exited]
[Thread 0x7fffb93ab700 (LWP 41410) exited]
[Thread 0x7fffc1bac700 (LWP 41409) exited]
[Thread 0x7fffca3ad700 (LWP 41408) exited]
[Thread 0x7fffd2bae700 (LWP 41407) exited]
[Thread 0x7fffdb3af700 (LWP 41406) exited]
[Thread 0x7fffe3bb0700 (LWP 41405) exited]
[Thread 0x7ffff43b1700 (LWP 41404) exited]
[Thread 0x7ffff4bb2700 (LWP 41403) exited]
[Detaching after fork from child process 41418]
[Detaching after fork from child process 41419]
[New Thread 0x7fff7dba4700 (LWP 41420)]
[New Thread 0x7fff863a5700 (LWP 41421)]
[Detaching after fork from child process 41423]
[New Thread 0x7fff8eba6700 (LWP 41424)]
[New Thread 0x7fff973a7700 (LWP 41425)]
[New Thread 0x7fff6c800700 (LWP 41426)]
[New Thread 0x7fff66474700 (LWP 41427)]
[New Thread 0x7fff65c73700 (LWP 41428)]
[New Thread 0x7fff65472700 (LWP 41429)]
[New Thread 0x7fff64c71700 (LWP 41430)]
[New Thread 0x7fff5a849700 (LWP 41431)]
[New Thread 0x7fff5a048700 (LWP 41432)]
[New Thread 0x7fff59847700 (LWP 41433)]
[New Thread 0x7fff59046700 (LWP 41434)]
[New Thread 0x7fff58845700 (LWP 41435)]
[New Thread 0x7fff53fff700 (LWP 41436)]
[New Thread 0x7fff537fe700 (LWP 41437)]
[New Thread 0x7fff52ffd700 (LWP 41438)]
[New Thread 0x7fff527fc700 (LWP 41439)]
[New Thread 0x7fff51ffb700 (LWP 41440)]
[New Thread 0x7fff43f05700 (LWP 41441)]
[New Thread 0x7fff436b4700 (LWP 41442)]
>>> [New Thread 0x7fff3bfff700 (LWP 41443)]
[New Thread 0x7fff42b59700 (LWP 41444)]
[New Thread 0x7fff428d7700 (LWP 41445)]
[New Thread 0x7fff427d5700 (LWP 41446)]
[New Thread 0x7fff426d3700 (LWP 41447)]
[New Thread 0x7fff425d1700 (LWP 41448)]
[New Thread 0x7fff424cf700 (LWP 41449)]
[New Thread 0x7fff423cd700 (LWP 41450)]
[New Thread 0x7fff381fe700 (LWP 41451)]
[New Thread 0x7fff14e5a700 (LWP 41452)]
[New Thread 0x7fff14d59700 (LWP 41453)]
[New Thread 0x7fff14c58700 (LWP 41454)]
[New Thread 0x7fff14b57700 (LWP 41455)]
[New Thread 0x7fff14a56700 (LWP 41456)]
[New Thread 0x7fff14955700 (LWP 41457)]
[New Thread 0x7fff14854700 (LWP 41458)]
[New Thread 0x7fff14752700 (LWP 41459)]
[New Thread 0x7fff14651700 (LWP 41460)]
[New Thread 0x7fff14550700 (LWP 41461)]
[New Thread 0x7fff1444e700 (LWP 41462)]
[New Thread 0x7fff1434c700 (LWP 41463)]
[New Thread 0x7fff1424a700 (LWP 41464)]
[New Thread 0x7fff14148700 (LWP 41465)]
[New Thread 0x7ffeecd13700 (LWP 41466)]
[New Thread 0x7ffeecc11700 (LWP 41467)]
[New Thread 0x7ffeecb0f700 (LWP 41468)]
[New Thread 0x7ffeeca0d700 (LWP 41469)]
[New Thread 0x7ffeec90b700 (LWP 41470)]
[New Thread 0x7ffeec809700 (LWP 41471)]
[New Thread 0x7ffeec707700 (LWP 41472)]
[New Thread 0x7ffeec605700 (LWP 41473)]
[New Thread 0x7ffeec504700 (LWP 41474)]
WARNING: Traceback (most recent call last):
  File "/home/gselzer/miniconda3/envs/napari-foobar/lib/python3.8/site-packages/napari/_qt/menus/plugins_menu.py", line 97, in _add_toggle_widget
    dock_widget, _w = self._win.add_plugin_dock_widget(*key)
  File "/home/gselzer/miniconda3/envs/napari-foobar/lib/python3.8/site-packages/napari/_qt/qt_main_window.py", line 685, in add_plugin_dock_widget
    wdg = _instantiate_dock_widget(Widget, self._qt_viewer.viewer)
  File "/home/gselzer/miniconda3/envs/napari-foobar/lib/python3.8/site-packages/napari/_qt/qt_main_window.py", line 1248, in _instantiate_dock_widget
    return wdg_cls(**kwargs)
  File "/home/gselzer/miniconda3/envs/napari-foobar/lib/python3.8/site-packages/magicgui/_magicgui.py", line 204, in __call__
    widget = self.func(param_options=prm_options, **{**factory_kwargs, **kwargs})
  File "/home/gselzer/miniconda3/envs/napari-foobar/lib/python3.8/site-packages/magicgui/widgets/_function_gui.py", line 148, in __init__
    sig = magic_signature(function, gui_options=param_options)
  File "/home/gselzer/miniconda3/envs/napari-foobar/lib/python3.8/site-packages/magicgui/signature.py", line 291, in magic_signature
    raise ValueError(
ValueError: Received parameter option key(s) {'max'} that do not match parameters in the provided function: (img_layer: Union[str, NoneType] = None)
--Type <RET> for more, q to quit, c to continue without paging--c

Thread 1 "python" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) [Thread 0x7ffeec504700 (LWP 41474) exited]
[Thread 0x7ffeec605700 (LWP 41473) exited]
[Thread 0x7ffeec707700 (LWP 41472) exited]
[Thread 0x7ffeec809700 (LWP 41471) exited]
[Thread 0x7ffeec90b700 (LWP 41470) exited]
[Thread 0x7ffeeca0d700 (LWP 41469) exited]
[Thread 0x7ffeecb0f700 (LWP 41468) exited]
[Thread 0x7ffeecc11700 (LWP 41467) exited]
[Thread 0x7ffeecd13700 (LWP 41466) exited]
[Thread 0x7fff14148700 (LWP 41465) exited]
[Thread 0x7fff1424a700 (LWP 41464) exited]
[Thread 0x7fff1434c700 (LWP 41463) exited]
[Thread 0x7fff1444e700 (LWP 41462) exited]
[Thread 0x7fff14550700 (LWP 41461) exited]
[Thread 0x7fff14651700 (LWP 41460) exited]
[Thread 0x7fff14752700 (LWP 41459) exited]
[Thread 0x7fff14854700 (LWP 41458) exited]
[Thread 0x7fff14955700 (LWP 41457) exited]
[Thread 0x7fff14a56700 (LWP 41456) exited]
[Thread 0x7fff14b57700 (LWP 41455) exited]
[Thread 0x7fff14c58700 (LWP 41454) exited]
[Thread 0x7fff14d59700 (LWP 41453) exited]
[Thread 0x7fff14e5a700 (LWP 41452) exited]
[Thread 0x7fff381fe700 (LWP 41451) exited]
[Thread 0x7fff423cd700 (LWP 41450) exited]
[Thread 0x7fff424cf700 (LWP 41449) exited]
[Thread 0x7fff425d1700 (LWP 41448) exited]
[Thread 0x7fff426d3700 (LWP 41447) exited]
[Thread 0x7fff427d5700 (LWP 41446) exited]
[Thread 0x7fff428d7700 (LWP 41445) exited]
[Thread 0x7fff42b59700 (LWP 41444) exited]
[Thread 0x7fff3bfff700 (LWP 41443) exited]
[Thread 0x7fff436b4700 (LWP 41442) exited]
[Thread 0x7fff43f05700 (LWP 41441) exited]
[Thread 0x7fff51ffb700 (LWP 41440) exited]
[Thread 0x7fff527fc700 (LWP 41439) exited]
[Thread 0x7fff52ffd700 (LWP 41438) exited]
[Thread 0x7fff537fe700 (LWP 41437) exited]
[Thread 0x7fff53fff700 (LWP 41436) exited]
[Thread 0x7fff58845700 (LWP 41435) exited]
[Thread 0x7fff59046700 (LWP 41434) exited]
[Thread 0x7fff59847700 (LWP 41433) exited]
[Thread 0x7fff5a048700 (LWP 41432) exited]
[Thread 0x7fff5a849700 (LWP 41431) exited]
[Thread 0x7fff64c71700 (LWP 41430) exited]
[Thread 0x7fff65472700 (LWP 41429) exited]
[Thread 0x7fff65c73700 (LWP 41428) exited]
[Thread 0x7fff66474700 (LWP 41427) exited]
[Thread 0x7fff6c800700 (LWP 41426) exited]
[Thread 0x7fff973a7700 (LWP 41425) exited]
[Thread 0x7fff8eba6700 (LWP 41424) exited]
[Thread 0x7fff863a5700 (LWP 41421) exited]
[Thread 0x7fff7dba4700 (LWP 41420) exited]

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
kephale commented 1 year ago

This environment is Linux specific and doesn't work on Mac without a bunch of tweaking that leads to version breakage. Testing on a Linux box next...

gselzer commented 1 year ago

@Czaki and @kephale suggested on the napari zulip that we could set the environment variable NAPARI_CATCH_ERRORS=0 to prevent napari from trying to access the JPype stats. This is cool, and we could hardcode it into the conda environment if we wanted, but it would be nice to be able to toggle it, especially via the napari-imagej configuration menu.

But @Czaki says that napari reads this variable on startup, so we'd have to evaluate any configuration before napari starts up. This could pose a problem; more investigation needed.