neuronsimulator / nrn

NEURON Simulator
http://nrn.readthedocs.io
Other
404 stars 118 forks source link

master built on mac segfaults on "from neuron import h" in anaconda python but not in nrniv -python #2358

Closed ramcdougal closed 1 month ago

ramcdougal commented 1 year ago

Context

Overview of the issue

I removed prior installs of NEURON, then:

git clone git@github.com:neuronsimulator/nrn
cd nrn
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/Users/ramcdougal
make -j
make install -j

The ~/lib/python folder is on my PYTHONPATH and ~/bin is on my PATH.

Attempting to run from Python fails with a segfault:

(base) ramcdougal@Roberts-MacBook-Pro ~ % cd empty_folder 
(base) ramcdougal@Roberts-MacBook-Pro empty_folder % python
Python 3.10.9 (main, Mar  1 2023, 12:20:14) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from neuron import h
zsh: segmentation fault  python

but running nrniv -python seems to work:

(base) ramcdougal@Roberts-MacBook-Pro empty_folder % nrniv -python
NEURON -- VERSION 9.0.dev-1350-g78c3f26c1 master (78c3f26c1) 2023-05-10
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

>>> from neuron import h
>>> import neuron
>>> neuron.__file__
'/Users/ramcdougal/lib/python/neuron/__init__.py'

Note: the crash occurs in the line PyLockGIL lock; at the beginning of nrnpy_hoc() in nrnpy_hoc.cpp. See the lldb trace below.

Expected result/behavior

Importing neuron from Python should work. (To be clear, the segfault occurs even on an import neuron.)

NEURON setup

Minimal working example - MWE

MWE that can be used for reproducing the issue and testing. A couple of examples:

lldb session

>>> import neuron
Process 46028 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x0000000103dddb98 libpython3.10.dylib`take_gil + 84
libpython3.10.dylib`take_gil:
->  0x103dddb98 <+84>: ldr    x25, [x20, #0x10]
    0x103dddb9c <+88>: add    x21, x25, #0x1b0
    0x103dddba0 <+92>: mov    x0, x21
    0x103dddba4 <+96>: bl     0x103ee6934               ; symbol stub for: pthread_mutex_lock
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x0000000103dddb98 libpython3.10.dylib`take_gil + 84
    frame #1: 0x0000000103e56b80 libpython3.10.dylib`PyGILState_Ensure + 132
    frame #2: 0x0000000103813774 libnrniv.dylib`nrnpy_hoc() [inlined] PyLockGIL::PyLockGIL(this=<unavailable>) at nrnpy_utils.h:108:18 [opt]
    frame #3: 0x0000000103813770 libnrniv.dylib`nrnpy_hoc() [inlined] PyLockGIL::PyLockGIL(this=<unavailable>) at nrnpy_utils.h:108:39 [opt]
    frame #4: 0x0000000103813770 libnrniv.dylib`nrnpy_hoc() at nrnpy_hoc.cpp:3116:15 [opt]
    frame #5: 0x0000000101eab808 hoc.cpython-310-darwin.so`PyInit_hoc + 2124
    frame #6: 0x00000001001d0620 python`_imp_create_dynamic + 772
    frame #7: 0x00000001000c644c python`cfunction_vectorcall_FASTCALL + 88
    frame #8: 0x000000010018067c python`_PyEval_EvalFrameDefault + 47632
    frame #9: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #10: 0x000000010017e96c python`_PyEval_EvalFrameDefault + 40192
    frame #11: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #12: 0x000000010017d1c4 python`_PyEval_EvalFrameDefault + 34136
    frame #13: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #14: 0x000000010019da48 python`call_function + 148
    frame #15: 0x000000010017695c python`_PyEval_EvalFrameDefault + 7408
    frame #16: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #17: 0x000000010019da48 python`call_function + 148
    frame #18: 0x000000010017695c python`_PyEval_EvalFrameDefault + 7408
    frame #19: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #20: 0x000000010019da48 python`call_function + 148
    frame #21: 0x000000010017695c python`_PyEval_EvalFrameDefault + 7408
    frame #22: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #23: 0x0000000100065fa8 python`_PyObject_VectorcallTstate.728 + 92
    frame #24: 0x000000010006aa48 python`object_vacall + 272
    frame #25: 0x000000010006abac python`_PyObject_CallMethodIdObjArgs + 128
    frame #26: 0x00000001001ccbac python`PyImport_ImportModuleLevelObject + 3492
    frame #27: 0x000000010016c418 python`builtin___import__ + 124
    frame #28: 0x00000001000c5648 python`cfunction_call + 60
    frame #29: 0x00000001001811b0 python`_PyEval_EvalFrameDefault + 50500
    frame #30: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #31: 0x000000010019da48 python`call_function + 148
    frame #32: 0x000000010017695c python`_PyEval_EvalFrameDefault + 7408
    frame #33: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #34: 0x0000000100065fa8 python`_PyObject_VectorcallTstate.728 + 92
    frame #35: 0x000000010006aa48 python`object_vacall + 272
    frame #36: 0x000000010006abac python`_PyObject_CallMethodIdObjArgs + 128
    frame #37: 0x00000001001cc224 python`PyImport_ImportModuleLevelObject + 1052
    frame #38: 0x000000010017b0e4 python`_PyEval_EvalFrameDefault + 25720
    frame #39: 0x0000000100172f60 python`_PyEval_Vector + 532
    frame #40: 0x000000010016da14 python`builtin_exec + 308
    frame #41: 0x00000001000c644c python`cfunction_vectorcall_FASTCALL + 88
    frame #42: 0x000000010018067c python`_PyEval_EvalFrameDefault + 47632
    frame #43: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #44: 0x000000010017e96c python`_PyEval_EvalFrameDefault + 40192
    frame #45: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #46: 0x000000010017d1c4 python`_PyEval_EvalFrameDefault + 34136
    frame #47: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #48: 0x000000010019da48 python`call_function + 148
    frame #49: 0x000000010017695c python`_PyEval_EvalFrameDefault + 7408
    frame #50: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #51: 0x000000010019da48 python`call_function + 148
    frame #52: 0x000000010017695c python`_PyEval_EvalFrameDefault + 7408
    frame #53: 0x0000000100067cf4 python`_PyFunction_Vectorcall + 548
    frame #54: 0x0000000100065fa8 python`_PyObject_VectorcallTstate.728 + 92
    frame #55: 0x000000010006aa48 python`object_vacall + 272
    frame #56: 0x000000010006abac python`_PyObject_CallMethodIdObjArgs + 128
    frame #57: 0x00000001001ccbac python`PyImport_ImportModuleLevelObject + 3492
    frame #58: 0x000000010017b0e4 python`_PyEval_EvalFrameDefault + 25720
    frame #59: 0x0000000100172f60 python`_PyEval_Vector + 532
    frame #60: 0x00000001001ec27c python`run_mod + 220
    frame #61: 0x00000001001ec63c python`PyRun_InteractiveOneObjectEx + 568
    frame #62: 0x00000001001eb7f0 python`_PyRun_InteractiveLoopObject + 132
    frame #63: 0x00000001001eb344 python`_PyRun_AnyFileObject + 76
    frame #64: 0x00000001001eda44 python`PyRun_AnyFileExFlags + 68
    frame #65: 0x000000010020f44c python`pymain_run_stdin + 156
    frame #66: 0x000000010020eaa8 python`pymain_run_python + 580
    frame #67: 0x000000010020e80c python`Py_RunMain + 40
    frame #68: 0x0000000100007b58 python`main + 56
    frame #69: 0x00000001a0937e50 dyld`start + 2544

cmake session

For completeness, here's the cmake session:

-- The C compiler identification is AppleClang 14.0.3.14030022
-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'RelWithDebInfo' as none was specified.
-- 3rd party project: using Random123 from "external/Random123"
-- No python executable specified. Looking for `python3` in the PATH...
-- Checking if /Users/ramcdougal/anaconda3/bin/python3 is a working python
-- Found BISON: /usr/bin/bison (found version "2.3") 
-- Found FLEX: /usr/bin/flex (found suitable version "2.6.4", minimum required is "2.6") 
-- Found Readline: /Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk/usr/include  
-- Found Cython: /Users/ramcdougal/anaconda3/bin/cython (found version "0.29.33") 
-- Found MPI_C: /opt/homebrew/Cellar/open-mpi/4.1.5/lib/libmpi.dylib (found version "3.1") 
-- Found MPI_CXX: /opt/homebrew/Cellar/open-mpi/4.1.5/lib/libmpi.dylib (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Detected OpenMPI 4.1.5
-- Sub-project : using iv from from /Users/ramcdougal/nrn/external/iv
-- Found X11: /usr/X11R6/include   
-- Checking for include files
-- Checking for functions
-- Checking for include directories
-- Checking for types
-- 
-- Configured INTERVIEWS 0.1
-- 
-- Some things you can do now:
-- --------------+--------------------------------------------------------------
-- Command       |   Description
-- --------------+--------------------------------------------------------------
-- make install  | Will install INTERVIEWS to: /Users/ramcdougal
--               | Change the install location of NEURON using:
--               |     cmake <src_path> -DCMAKE_INSTALL_PREFIX=<install_path>
-- --------------+--------------------------------------------------------------
-- Build option  | Status
-- --------------+--------------------------------------------------------------
-- BUILD_TYPE    | RelWithDebInfo (allowed: Custom;Debug;Release;RelWithDebInfo)
-- SHARED        | OFF
-- X11_DYNAMIC   | OFF
-- --------------+--------------------------------------------------------------
--  See more : https://github.com/neuronsimulator/iv
-- --------------+--------------------------------------------------------------
-- 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
Extracting link flags from target 'Threads::Threads', beware that this can be fragile. Got: 
Generating link flags from path /Users/ramcdougal/anaconda3/lib/libpython3.10.dylib Got: /Users/ramcdougal/anaconda3/lib/libpython3.10.dylib -Wl,-rpath,/Users/ramcdougal/anaconda3/lib
Generating link flags from path /opt/homebrew/Cellar/open-mpi/4.1.5/lib/libmpi.dylib Got: /opt/homebrew/Cellar/open-mpi/4.1.5/lib/libmpi.dylib -Wl,-rpath,/opt/homebrew/Cellar/open-mpi/4.1.5/lib
Generating link flags from path /usr/X11R6/lib/libSM.dylib Got: /usr/X11R6/lib/libSM.dylib -Wl,-rpath,/usr/X11R6/lib
Generating link flags from path /usr/X11R6/lib/libICE.dylib Got: /usr/X11R6/lib/libICE.dylib -Wl,-rpath,/usr/X11R6/lib
Generating link flags from path /usr/X11R6/lib/libX11.dylib Got: /usr/X11R6/lib/libX11.dylib -Wl,-rpath,/usr/X11R6/lib
Generating link flags from path /usr/X11R6/lib/libXext.dylib Got: /usr/X11R6/lib/libXext.dylib -Wl,-rpath,/usr/X11R6/lib
-- 
-- Configured NEURON 9.0.0
-- 
-- You can now build NEURON using:
--   cmake --build . --parallel 8 [--target TARGET]
-- You might want to adjust the number of parallel build jobs for your system.
-- Some non-default targets you might want to build:
-- --------------+--------------------------------------------------------------
--  Target       |   Description
-- --------------+--------------------------------------------------------------
-- install       | Will install NEURON to: /Users/ramcdougal
--               | Change the install location of NEURON using:
--               |   cmake <src_path> -DCMAKE_INSTALL_PREFIX=<install_path>
-- docs          | Build full docs. Calls targets: doxygen, notebooks, sphinx, notebooks-clean
-- uninstall     | Removes files installed by make install (todo)
-- --------------+--------------------------------------------------------------
--  Build option | Status
-- --------------+--------------------------------------------------------------
-- C COMPILER    | /Library/Developer/CommandLineTools/usr/bin/cc
-- CXX COMPILER  | /Library/Developer/CommandLineTools/usr/bin/c++
-- BUILD_TYPE    | RelWithDebInfo (allowed: Custom;Debug;Release;RelWithDebInfo;Fast)
-- COMPILE FLAGS | -g  -O2 
-- Shared        | ON
-- Default units | modern units (2019 nist constants)
-- MPI           | ON
--   DYNAMIC     | OFF
--   INC         | /opt/homebrew/Cellar/open-mpi/4.1.5/include
--   LIB         | /opt/homebrew/Cellar/open-mpi/4.1.5/lib/libmpi.dylib
-- Python        | ON
--   DYNAMIC     | OFF
--   MODULE      | ON
--  python3.10 (default)
--   EXE         | /Users/ramcdougal/anaconda3/bin/python3
--   INC         | /Users/ramcdougal/anaconda3/include/python3.10
--   LIB         | /Users/ramcdougal/anaconda3/lib/libpython3.10.dylib
-- Readline      | /Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk/usr/lib/libreadline.tbd
-- Curses        | /Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk/usr/lib/libcurses.tbd;/Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk/usr/lib/libform.tbd
-- RX3D          | ON
--   OptLevel    | 0
-- Interviews    | ON
--   PATH        | /Users/ramcdougal/nrn/external/iv
--   INC         | /Users/ramcdougal/nrn/external/iv/src/include
--   X11 (INC)   | /usr/X11R6/include
--       (LIBDIR)| /usr/X11R6/lib
-- CoreNEURON    | OFF
-- Tests         | OFF
-- --------------+--------------------------------------------------------------
--  See documentation : https://www.neuron.yale.edu/neuron/
-- --------------+--------------------------------------------------------------
-- 
-- Configuring done (10.3s)
-- Generating done (0.1s)
-- Build files have been written to: /Users/ramcdougal/nrn/build
nrnhines commented 1 year ago

I assume at this late date that everything is arm64 or both arm64 and x86_64 To verify

lipo -archs `which python3`
lipo -archs /Users/ramcdougal/anaconda3/lib/libpython3.10.dylib
lipo -archs `which nrniv`
lipo -archs /Users/ramcdougal/lib/libnrniv.dylib
ramcdougal commented 1 year ago

I don't have this machine with me at the moment (and will verify later), but in general the Mac will complain if you try to use an x86_64 library with an arm Python.

In any case, the first few lines of NEURON successfully run; it's just when you try to grab the GIL that it segfaults.

nrnhines commented 1 year ago

You're right about the arm64 vs x86_64 issue being a red herring. My only other idea is similar. It's clear that

--  python3.10 (default)
--   EXE         | /Users/ramcdougal/anaconda3/bin/python3
--   INC         | /Users/ramcdougal/anaconda3/include/python3.10
--   LIB         | /Users/ramcdougal/anaconda3/lib/libpython3.10.dylib

But just for fun, can you rebuild with an explicit -DPYTHON_EXECUTABLE=which python3

ramcdougal commented 1 year ago

For the record, the lipo commands all reported arm64 and nothing changed with the explicit specification of which Python.

nrnhines commented 1 year ago

I also get the segfault on my M1 after installing anaconda3. I configured with

 build % cmake .. -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_INSTALL_PREFIX=install -DNRN_ENABLE_TESTS=ON 

and the python is

--  python3.10 (default)
--   EXE         | /Users/hines/anaconda3/bin/python3
--   INC         | /Users/hines/anaconda3/include/python3.10
--   LIB         | /Users/hines/anaconda3/lib/libpython3.10.dylib

My first build attempt resulted in

FAILED: src/nrnpython/CMakeFiles/hoc_module.util 
...
INFO:root:setup.py called with:setup.py build --cmake-build-dir /Users/hines/neuron/anacon/build --rx3d-opt-level 0 --without-nrnpython --build-lib=/Users/hines/neuron/anacon/build/lib/python build_ext --define=USE_PYTHON,NRN_ENABLE_THREADS
ERROR:root:ERROR: RX3D wheel requires Cython and numpy. Please install beforehand

Though import numpy works and

% which cython
/Library/Frameworks/Python.framework/Versions/3.11/bin/cython

I chose for the moment to use -DNRN_ENABLE_RX3D=OFF, ad the build succeeded. Then

 % python3
Python 3.10.9 (main, Mar  1 2023, 12:20:14) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from neuron import h
zsh: segmentation fault  python3

Rebuilding with -DNRN_ENABLE_PYTHON_DYNAMIC=ON seems to work around the issue

% python3
Python 3.10.9 (main, Mar  1 2023, 12:20:14) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from neuron import h
>>> h.nrnversion(6)
"cmake option default differences: 'NRN_ENABLE_RX3D=OFF' 'NRN_ENABLE_TESTS=ON' 'NRN_ENABLE_PYTHON_DYNAMIC=ON' 'NRN_LINK_AGAINST_PYTHON=OFF' 'CMAKE_INSTALL_PREFIX=/Users/hines/neuron/anacon/build/install' 'CMAKE_C_COMPILER=/usr/bin/clang' 'CMAKE_CXX_COMPILER=/usr/bin/clang++' 'PYTHON_EXECUTABLE=/Users/hines/anaconda3/bin/python3'"
>>> 

At the moment, I have no idea why the build time linkage to anaconda3 python3.10 exhibits the segfault. I happen to have a python.org installation of python3.10 on this machine. Building and linking against that one (which allows RX3D ON) does work.

ramcdougal commented 1 year ago

I think the anaconda build attempt with rx3d failed because it was finding the system framework cython. That could probably be fixed by conda install cython in the activated anaconda environment.

... Doesn't help with the segfault issue though.

nrnhines commented 1 year ago

With respect to build time linkage to python3.10, the only (relevant?) difference I see between the anaconda build and the python.org build is

python.org

build2 % otool -L lib/libnrniv.dylib
...
/Library/Frameworks/Python.framework/Versions/3.10/Python (compatibility version 3.10.0, current version 3.10.0)

anaconda3

build % otool -L lib/libnrniv.dylib
...
@rpath/libpython3.10.dylib (compatibility version 3.10.0, current version 3.10.0)

I suppose I could try installing the python.org version of python3.10.9 but I don't see how that would help me understand the reason for the segfault.

nrnhines commented 1 year ago

This is highly speculative, but notice that anaconda3 python3.10 does not link to libpython

 build % otool -L `which python3`
/Users/hines/anaconda3/bin/python3:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.60.1)

Does the dynamic loader know that everything it is looking for is in in python3.10.9 and it shouldn't load @rpath/libpython3.10.dylib ?

nrnhines commented 1 year ago

I'm wondering if the -DNRN_ENABLE_PYTHON_DYNAMIC=ON work around is sufficient to close this issue?

ramcdougal commented 1 year ago

It addresses my immediate problem (thanks), but given that build time linkage is (1) the default and (2) supposed to work, I'd argue the issue should stay open until resolved.

(EDIT: I tested the -DNRN_ENABLE_PYTHON_DYNAMIC=ON fix; that worked on my machine too. Thanks.)

With respect to (1), should dynamic be the default? Switching to 9.0 would be the time to make a change like that.

nrnhines commented 1 year ago

There may be something to my speculation. I copied the link line for libnrniv.dylib from ninja -j 1 -v >& temp and modified temp into a 372 line bash script with the single command

#!/bin/sh
set -ex
/usr/bin/clang++ -g -O2 -arch arm64 -isysroot \
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk \
-dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup \
-o lib/libnrniv.dylib -install_name @rpath/libnrniv.dylib \
src/nrniv/CMakeFiles/nrniv_lib.dir/__/ivoc/apwindow.cpp.o \
...
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/lib/libform.tbd \
 \
/opt/homebrew/Cellar/open-mpi/4.1.5/lib/libmpi.dylib lib/libinterviews.a \
/opt/homebrew/lib/libX11.dylib /opt/homebrew/lib/libXext.dylib \

#/Users/hines/anaconda3/lib/libpython3.10.dylib \

Then otool -L lib/libnrniv.dylib does not mention libpython3.10.dylib and I copy the library to it's install location. That eliminates the segfault. Note that install/lib/python/neuron/hoc.cpython-310-darwin.so remains unchanged but never mentioned:

build % otool -L install/lib/python/neuron/hoc.cpython-310-darwin.so
install/lib/python/neuron/hoc.cpython-310-darwin.so:
    @rpath/libnrniv.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3)
ramcdougal commented 1 year ago

@nrnhines This isn't an M1/M2 thing. We just ran into the same issue with Anaconda Python (3.8 and 3.10) on Intel macs.

pramodk commented 1 month ago

I looked into this a week ago but didn't get time to write the summary.

Michael already mentioned that dynamic Python works, but I wasn't sure of the root cause. I tried various things and spent time on the false leads (like this). I would say this 5five-year-old post for VTK summaries the issue quite well:

... Recently conda linked python3 statically, so all python symbols are included in the executable instead of being brought in by libpython. This created a problem with VTK used from python, because VTK links with libpython (it uses matplotlib for math text). So, you had python code brought in by the pyhton executable and by libpyton which resulted in a segfault for tests that used python and math text.

I tested this on the Anaconda linux distribution, and the issue doesn't appear.

I will create a PR with a small change so that CMake can check if we are using Anaconda Python on MacOS and then disable linking libpython. Given that the issue appears only on Mac and with Anaconda, I think this is sufficient.