Open mgeplf opened 2 months ago
My current theory is that:
1) Since there is a single invocation of the python interpreter, there are interactions between the tests being run
2) There is global state being stored, specifically sgid2srcindex_
and visources_
in src/nrniv/partrans.cpp
3) test/pytest_coreneuron/test_partrans.py
cause the above globals to be modified - the Node
's allocated and stored get free
'd, but the references in the globals aren't cleared.
4) Non-deterministically, a later test overwrites some heap memory
5) Later, in mk_ttd
, the dangling Node
ref is used: https://github.com/neuronsimulator/nrn/blob/master/src/nrniv/partrans.cpp#L499)
6) Depending on the overwrite in 4), there is a segfault
When I have some more time, I'll see if I can confirm the theory
Got a valgrind trace that summarizes the above:
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_swc.py ==21626== Invalid read of size 8
==21626== at 0x8EDD150: mk_ttd() (partrans.cpp:503)
==21626== by 0x8F49D26: nrn_thread_memblist_setup() (multicore.cpp:647)
==21626== by 0x8F406DC: v_setup_vectors() (treeset.cpp:1697)
==21626== by 0x8F37912: nrnhoc_topology() (solve.cpp:296)
==21626== by 0x8F530E6: hoc_call() (code.cpp:1418)
==21626== by 0x8FDFF19: fcall(void*, void*) (nrnpy_hoc.cpp:728)
==21626== by 0x8ED9ACD: OcJump::fpycall(void* (*)(void*, void*), void*, void*) (ocjump.cpp:138)
==21626== by 0x8FDCF09: hocobj_call(PyHocObject*, _object*, _object*) (nrnpy_hoc.cpp:796)
==21626== by 0x49AC9F6: _PyObject_MakeTpCall (call.c:214)
==21626== by 0x494B8F2: _PyEval_EvalFrameDefault (ceval.c:4772)
==21626== by 0x4AB8A98: _PyEval_EvalFrame (pycore_ceval.h:73)
==21626== by 0x4AB8A98: _PyEval_Vector (ceval.c:6428)
==21626== by 0x49AC788: _PyVectorcall_Call (call.c:245)
==21626== by 0x49AC788: _PyObject_Call (call.c:328)
==21626== Address 0x8676a98 is 104 bytes inside a block of size 120 free'd
==21626== at 0x483D1CF: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==21626== by 0x8F37B85: node_destruct(Node**, int) (solve.cpp:585)
==21626== by 0x8F39597: node_free (solve.cpp:474)
==21626== by 0x8F39597: sec_free(hoc_Item*) (solve.cpp:502)
==21626== by 0x8F05D47: delete_section() (cabcode.cpp:358)
==21626== by 0x8F530E6: hoc_call() (code.cpp:1418)
==21626== by 0x8FDFF19: fcall(void*, void*) (nrnpy_hoc.cpp:728)
==21626== by 0x8ED9ACD: OcJump::fpycall(void* (*)(void*, void*), void*, void*) (ocjump.cpp:138)
==21626== by 0x8FDCF09: hocobj_call(PyHocObject*, _object*, _object*) (nrnpy_hoc.cpp:796)
==21626== by 0x49AC9F6: _PyObject_MakeTpCall (call.c:214)
==21626== by 0x494B8F2: _PyEval_EvalFrameDefault (ceval.c:4772)
==21626== by 0x4AB8A98: _PyEval_EvalFrame (pycore_ceval.h:73)
==21626== by 0x4AB8A98: _PyEval_Vector (ceval.c:6428)
==21626== by 0x49AC788: _PyVectorcall_Call (call.c:245)
==21626== by 0x49AC788: _PyObject_Call (call.c:328)
==21626== Block was alloc'd at
==21626== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==21626== by 0x8F37D48: node_alloc(Section*, short) (solve.cpp:744)
==21626== by 0x8F02E30: nrn_change_nseg(Section*, int) (cabcode.cpp:1508)
==21626== by 0x8F03977: new_section(Object*, Symbol*, int) (cabcode.cpp:298)
==21626== by 0x8F0467F: nrnpy_newsection(NPySecObj*) (cabcode.cpp:327)
==21626== by 0x8FE7A99: NPySecObj_init(NPySecObj*, _object*, _object*) (nrnpy_nrn.cpp:376)
==21626== by 0x8FE7B8C: NPySecObj_new(_typeobject*, _object*, _object*) (nrnpy_nrn.cpp:401)
==21626== by 0x4A013E2: cfunction_call (methodobject.c:542)
==21626== by 0x49AC9F6: _PyObject_MakeTpCall (call.c:214)
==21626== by 0x494B8F2: _PyEval_EvalFrameDefault (ceval.c:4772)
==21626== by 0x4AB8A98: _PyEval_EvalFrame (pycore_ceval.h:73)
==21626== by 0x4AB8A98: _PyEval_Vector (ceval.c:6428)
==21626== by 0x49ACB7A: _PyObject_FastCallDictTstate (call.c:141)
The reduced set of tests required to trigger it somewhat often (1 in 10 times?) is:
python3 -m pytest \
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_basic.py \
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_bbss.py \
\
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_coreneuron_configuration.py \
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_hoc_po.py \
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_nrntest_fast.py \
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_partrans.py \
test/pytest_coreneuron/basic_tests_py3.11/test/pytest_coreneuron/test_swc.py \
@mgeplf : I haven't looked at the code details but not surprised by the global state-related issue. With the code that you have skimmed through, do you know already a potential fix?
My guess is that one of node_destruct/node_free/sec_free/delete_section
has to be made aware of partrans.cpp::visources_
.
Context
Overview of the issue
I saw the pytest pin, and was wondering if I could get it to happen with the older version of pytest
Expected result/behavior
Not have a segfault
NEURON setup
Commit 8a9bada6; python 3.11 from pyenv on Ubuntu 20.04.
If I run it a few times, along with failures, I get a SEGFAULT.
In gdb, I get the following traceback: