neuronsimulator / nrn

NEURON Simulator
http://nrn.readthedocs.io
Other
403 stars 118 forks source link

Master fails to run some tests with NVHPC and -O2 optimisation flag #3129

Closed pramodk closed 5 days ago

pramodk commented 5 days ago

Context

Neuron fails to run simple tests when build with -O2 and NVHPC compiler. It appears this is also the case since SoA was introduced.

Expected result/behavior

All tests should also pass with -O2

NEURON setup

Minimal working example - MWE

It's existing test_shape.py


from neuron import h, gui

h.load_file(h.neuronhome() + "/demo/pyramid.nrn")
for sec in h.allsec():
    sec.insert("pas")
h.soma.uninsert("pas")

psh = h.PlotShape()
psh.variable("g_pas")
psh.exec_menu("Shape Plot")

# cover a line in ShapeSection::set_range_variable
h.delete_section(sec=h.soma)
psh.variable("g_pas")
[New Thread 0x7ffff06ff640 (LWP 9234)]
python: /home/kumbhar/workarena/repos/bbp/nrn/src/nrnpython/nrnpy_nrn.cpp:929: NPySecObj *newpysechelp(Section *): Assertion `pysec->sec_ == sec' failed.

Thread 1 "python" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350234112) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350234112) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737350234112) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737350234112, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c8a476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c707f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7c7071b in __assert_fail_base (fmt=0x7ffff7e25130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff6f26fbc <.S42420> "pysec->sec_ == sec",
    file=0x7ffff6f27097 <.S42091> "/home/kumbhar/workarena/repos/bbp/nrn/src/nrnpython/nrnpy_nrn.cpp", line=929, function=<optimized out>) at ./assert/assert.c:92
#6  0x00007ffff7c81e96 in __GI___assert_fail (assertion=0x7ffff6f26fbc <.S42420> "pysec->sec_ == sec",
    file=0x7ffff6f27097 <.S42091> "/home/kumbhar/workarena/repos/bbp/nrn/src/nrnpython/nrnpy_nrn.cpp", line=929,
    function=0x7ffff754dcf0 <_T21_0> "NPySecObj *newpysechelp(Section *)") at ./assert/assert.c:101
#7  0x00007ffff6e4dcd8 in newpysechelp (sec=0x555556730c40) at /home/kumbhar/workarena/repos/bbp/nrn/src/nrnpython/nrnpy_nrn.cpp:929
#8  0x00007ffff6e30ce1 in iternext_sl (po=0x7ffff7581ef0, ql=0x555555c302e0) at /home/kumbhar/workarena/repos/bbp/nrn/src/nrnpython/nrnpy_hoc.cpp:1775
#9  0x00007ffff6e30ebe in hocobj_iternext (self=0x7ffff7581ef0) at /home/kumbhar/workarena/repos/bbp/nrn/src/nrnpython/nrnpy_hoc.cpp:1819
#10 0x0000555555699caf in _PyEval_EvalFrameDefault ()
#11 0x0000555555696016 in ?? ()
#12 0x000055555578b8b6 in PyEval_EvalCode ()
#13 0x00005555557b6918 in ?? ()
#14 0x00005555557b01db in ?? ()
#15 0x00005555557b6665 in ?? ()
#16 0x00005555557b5b48 in _PyRun_SimpleFileObject ()
#17 0x00005555557b5793 in _PyRun_AnyFileObject ()
#18 0x00005555557a82ce in Py_RunMain ()
#19 0x000055555577e70d in Py_BytesMain ()
#20 0x00007ffff7c71d90 in __libc_start_call_main (main=main@entry=0x55555577e6d0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd668) at ../sysdeps/nptl/libc_start_call_main.h:58
#21 0x00007ffff7c71e40 in __libc_start_main_impl (main=0x55555577e6d0, argc=2, argv=0x7fffffffd668, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffffffd658) at ../csu/libc-start.c:392
#22 0x000055555577e605 in _start ()
(gdb)
pramodk commented 5 days ago

As mentioned during the dev meeting, I was able to reproduce the issue by compiling whole project with -O1 and only netcvode.cpp with -O2.

Just for the record, I was using CMake line as:

cmake .. -DNRN_ENABLE_INTERVIEWS=OFF -DNRN_ENABLE_RX3D=OFF -DCMAKE_INSTALL_PREFIX=`pwd`/install -DNRN_ENABLE_TESTS=OFF -DCMAKE_CXX_FLAGS="-g -O" -DCMAKE_BUILD_TYPE=Custom

and diff:

+set_source_files_properties(path-to-/src/nrncvode/netcvode.cpp PROPERTIES COMPILE_OPTIONS "-O2;")

Anyway, after various debugging attempts and back-and-forths, I realized there is a new NVHPC release. If this seems like a compiler bug, it might be better to try the latest version. And voila - the issue disappears with the nvhpc/24.9!

I will close this as I don't see any need to investigate the old, buggy version.