Open alexsavulescu opened 2 years ago
Test script
>cat err_nrnpython.hoc
if (!nrnpython("from neuron import coreneuron")) {
execerror("Python not available, can not import coreneuron module\n")
}
objref py_obj
py_obj = new PythonObject()
py_obj.coreneuron.enable = 1
py_obj.coreneuron.cell_permute = 10
And execution in serial:
>nrniv err_nrnpython.hoc
Traceback (most recent call last):
File "/gpfs/bbp.cscs.ch/home/weji/workdir/nrn/build/install/lib/python/neuron/coreneuron.py", line 111, in cell_permute
assert value in self._valid_cell_permute()
AssertionError
nrniv: Assignment to PythonObject failed
in err_nrnpython.hoc near line 7
py_obj.coreneuron.cell_permute = 10
^
>echo $?
1
in mpi:
>mpirun -n 2 nrniv err_nrnpython.hoc
Traceback (most recent call last):
File "/gpfs/bbp.cscs.ch/home/weji/workdir/nrn/build/install/lib/python/neuron/coreneuron.py", line 111, in cell_permute
assert value in self._valid_cell_permute()
AssertionError
/gpfs/bbp.cscs.ch/home/weji/workdir/nrn/build/install/bin/nrniv: Assignment to PythonObject failed
in err_nrnpython.hoc near line 7
py_obj.coreneuron.cell_permute = 10
^
Traceback (most recent call last):
File "/gpfs/bbp.cscs.ch/home/weji/workdir/nrn/build/install/lib/python/neuron/coreneuron.py", line 111, in cell_permute
assert value in self._valid_cell_permute()
AssertionError
/gpfs/bbp.cscs.ch/home/weji/workdir/nrn/build/install/bin/nrniv: Assignment to PythonObject failed
in err_nrnpython.hoc near line 7
py_obj.coreneuron.cell_permute = 10
^
srun: error: r1i4n21: tasks 0-1: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=561639.1
weji@r1i4n21:~/workdir/nrn/build>echo $?
1
>nrniv -nogui -nobanner -c "1/0"
nrniv: division by zero
near line 1
1/0
^
nrniv: arg not valid statement: 1/0
near line 0
^
>echo $?
0
Which is expected to return a non-zero status code.
PR #1871 is to solve this issue so that it returns properly for nrniv -c
.
With the fix, in serial
>nrniv -nogui -nobanner -c "1/0"
nrniv: division by zero
near line 1
1/0
^
nrniv: arg not valid statement: 1/0
near line 0
^
>echo $?
1
in mpi
>mpirun -n 1 nrniv -nogui -nobanner -c "1/0"
/gpfs/bbp.cscs.ch/home/weji/workdir/nrn/build/install/bin/nrniv: division by zero
near line 1
1/0
^
/gpfs/bbp.cscs.ch/home/weji/workdir/nrn/build/install/bin/nrniv: arg not valid statement: 1/0
near line 0
^
srun: error: r1i5n4: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=566087.0
>echo $?
1
case 3: cross check the python session after PR #1871
>python
Python 3.9.7 (default, Jan 10 2022, 21:17:49)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from neuron import h
Warning: no DISPLAY environment variable.
--No graphics will be displayed.
>>> h(1/0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> h.sqrt(-1)
NEURON: sqrt argument out of domain
near line 0
objref hoc_obj_[2]
^
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: hocobj_call error
>>>
The hoc errors doesn't quit the python session, which is correct.
@WeinaJi I just stumbled onto another case, but your PR maybe already fixes it:
>>> h.List("A").count()
NEURON: A is not a template name
near line 0
objref hoc_obj_[2]
^
List("A")
Segmentation fault
And:
>>> h.List(h.NetStim())
bad stack access: expecting (double); really (Object *)
NEURON: interpreter stack type error
near line 0
objref hoc_obj_[2]
^
List(...)
Segmentation fault
@WeinaJi the dissapearing MPI errors that I mentioned during the closing remarks:
Simulated 4521.0/8000.0ms. 15605.28s elapsed. Simulated tick in 3.35. Avg tick 3.4517s
Simulated 4522.0/8000.0ms. 15608.63s elapsed. Simulated tick in 3.35. Avg tick 3.4517s
Rank 0 [Fri Jun 24 01:22:23 2022] [c6-0c2s5n0] application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
srun: error: nid01300: task 0: Aborted (core dumped)
Could NEURON always try to report WHY it is aborting?
I can't find any paths to MPI_Abort in NEURON that don't have error messages. I wonder if the MPI_Abort is being called from CoreNEURON. Or if the message is getting lost somehow because the MPI_Abort shuts things down too quickly. Can you add a print statement to nrn/external/coreneuron/coreneuron/mpi/lib/nrnmpi.cpp and src/nrnmpi/nrnmpi.cpp above the MPI_Abort call? That might narrow the search.
Aborted (core dumped)
Did you get a "coredump" or stack trace?
As the MPI_Abort
issue might be slightly more complicated I've continued the report on #1881.
@WeinaJi I'd still be interested to see if the Segmentation faults after HOC errors could be considered; not segfaulting is an "error handling improvement" haha 😉
Hello @Helveg ,
Thanks for reporting the error cases. The seg fault should be fixed by PR https://github.com/neuronsimulator/nrn/pull/1917 . Please let me know if that works for you.
Check which combinations of C++/Python/HOC errors in serial/MPI modes are correctly handled and propagated.
Some improvements have been made recently:
Another improvement has been proposed and pending:
There is some of way of executing Python inline in HOC, does that work?
do errors here get propagated correctly, for example.
gpu
routes via a Python setter (https://github.com/neuronsimulator/nrn/blob/f242cc8d4ddefb74f71f7f613a4b7b4043c2b641/share/lib/python/neuron/coreneuron.py#L61-L75) that could throw, for example.