Closed kliem closed 3 years ago
New commits:
4ae6966 | Merge tag '9.4.beta0' into t/31245/first_parallel_version_of_face_iterator_reb2 |
Segfaults reliably on OSX
sage -t --long --random-seed=0 src/sage/geometry/polyhedron/combinatorial_polyhedron/base.pyx # Killed due to segmentation fault
sage -t --long --random-seed=0 src/sage/geometry/polyhedron/base.py # Killed due to segmentation fault
Debugger:
$ sage -sh
(sage-sh) sudo lldb ./local/bin/python3
(lldb) target create "./local/bin/python3"
Current executable set to '/Users/buildbot-sage/slave/sage_git/build/local/bin/python3' (x86_64).
(lldb) rfrom
error: 'rfrom' is not a valid command.
(lldb) r
Process 81901 launched: '/Users/buildbot-sage/slave/sage_git/build/local/bin/python3' (x86_64)
Python 3.9.5 (default, Jun 8 2021, 19:13:24)
[Clang 12.0.5 (clang-1205.0.22.9)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sage.all import *
>>> line = Polyhedron(lines=[[0,1]])
>>> line.vertex_graph() ## line 7185 ##
base.cpython-39-darwin.so was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 81901 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x00000001142ae934 base.cpython-39-darwin.so`__pyx_gb_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_24_compute_edges_or_ridges_5generator29(__pyx_generator=0x0000000114562160, __pyx_tstate=0x0000000100508140, __pyx_sent_value=0x00000001003a8580) at base.c:28817:74 [opt]
28814 __pyx_t_2 = __pyx_t_1;
28815 for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
28816 __pyx_cur_scope->__pyx_v_i = __pyx_t_3;
-> 28817 __pyx_t_4 = ((PyObject *)__pyx_f_4sage_5rings_7integer_smallInteger((__pyx_cur_scope->__pyx_outer_scope->__pyx_v_f_vector[__pyx_cur_scope->__pyx_v_i]))); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 3071, __pyx_L1_error)
28818 __Pyx_GOTREF(__pyx_t_4);
28819 __pyx_r = __pyx_t_4;
28820 __pyx_t_4 = 0;
Target 0: (python3) stopped.
(lldb)
error: No auto repeat.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x00000001142ae934 base.cpython-39-darwin.so`__pyx_gb_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_24_compute_edges_or_ridges_5generator29(__pyx_generator=0x0000000114562160, __pyx_tstate=0x0000000100508140, __pyx_sent_value=0x00000001003a8580) at base.c:28817:74 [opt]
frame #1: 0x00000001033bc4fa cachefunc.cpython-39-darwin.so`__Pyx_Coroutine_SendEx(self=0x0000000114562160, value=0x00000001003a8580, closing=<unavailable>) at cachefunc.c:28545:14 [opt]
frame #2: 0x00000001001573c0 libpython3.9.dylib`PySequence_Tuple + 144
frame #3: 0x00000001142abde3 base.cpython-39-darwin.so`__pyx_f_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron__compute_edges_or_ridges(__pyx_v_self=<unavailable>, __pyx_v_dual=<unavailable>, __pyx_v_do_edges=<unavailable>) at base.c:29447:19 [opt]
frame #4: 0x00000001142b7aa7 base.cpython-39-darwin.so`__pyx_pw_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_23edges [inlined] __pyx_f_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron__compute_edges(__pyx_v_self=<unavailable>, __pyx_v_dual=<unavailable>) at base.c:43309:15 [opt]
frame #5: 0x00000001142b7a9b base.cpython-39-darwin.so`__pyx_pw_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_23edges [inlined] __pyx_pf_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_22edges(__pyx_v_self=0x0000000114235ca0, __pyx_v_names=0x0000000000000000) at base.c:14187 [opt]
frame #6: 0x00000001142b742a base.cpython-39-darwin.so`__pyx_pw_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_23edges(__pyx_v_self=0x0000000114235ca0, __pyx_args=<unavailable>, __pyx_kwds=<unavailable>) at base.c:13668 [opt]
frame #7: 0x00000001001ab7c5 libpython3.9.dylib`cfunction_call + 69
frame #8: 0x00000001142ca4d7 base.cpython-39-darwin.so`__pyx_gb_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_12vertex_graph_2generator13 [inlined] __Pyx_PyObject_Call(func=0x0000000114569860, arg=<unavailable>, kw=0x0000000100722e80) at base.c:52465:14 [opt]
frame #9: 0x00000001142ca49a base.cpython-39-darwin.so`__pyx_gb_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_12vertex_graph_2generator13(__pyx_generator=0x0000000112ae5b80, __pyx_tstate=<unavailable>, __pyx_sent_value=<unavailable>) at base.c:14600 [opt]
frame #10: 0x00000001033bc4fa cachefunc.cpython-39-darwin.so`__Pyx_Coroutine_SendEx(self=0x0000000112ae5b80, value=0x00000001003a8580, closing=<unavailable>) at cachefunc.c:28545:14 [opt]
frame #11: 0x00000001001573c0 libpython3.9.dylib`PySequence_Tuple + 144
frame #12: 0x00000001142b86c8 base.cpython-39-darwin.so`__pyx_pw_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_25vertex_graph [inlined] __pyx_pf_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_24vertex_graph(__pyx_v_self=<unavailable>, __pyx_v_names=0x0000000000000000) at base.c:14808:15 [opt]
frame #13: 0x00000001142b85bb base.cpython-39-darwin.so`__pyx_pw_4sage_8geometry_10polyhedron_24combinatorial_polyhedron_4base_23CombinatorialPolyhedron_25vertex_graph(__pyx_v_self=<unavailable>, __pyx_args=<unavailable>, __pyx_kwds=<unavailable>) at base.c:14515 [opt]
frame #14: 0x00000001001759c3 libpython3.9.dylib`method_vectorcall_VARARGS_KEYWORDS + 275
frame #15: 0x000000010024bc1b libpython3.9.dylib`call_function + 411
frame #16: 0x0000000100248c0b libpython3.9.dylib`_PyEval_EvalFrameDefault + 27147
frame #17: 0x000000010016d5b5 libpython3.9.dylib`function_code_fastcall + 229
frame #18: 0x000000010016f89c libpython3.9.dylib`method_vectorcall + 204
frame #19: 0x000000010024bc1b libpython3.9.dylib`call_function + 411
frame #20: 0x0000000100248c2e libpython3.9.dylib`_PyEval_EvalFrameDefault + 27182
frame #21: 0x000000010024c9d4 libpython3.9.dylib`_PyEval_EvalCode + 2580
frame #22: 0x0000000100242107 libpython3.9.dylib`PyEval_EvalCode + 87
frame #23: 0x000000010028c45f libpython3.9.dylib`PyRun_InteractiveOneObjectEx + 847
frame #24: 0x000000010028ba59 libpython3.9.dylib`PyRun_InteractiveLoopFlags + 169
frame #25: 0x000000010028b97c libpython3.9.dylib`PyRun_AnyFileExFlags + 60
frame #26: 0x00000001002a84ea libpython3.9.dylib`Py_RunMain + 2362
frame #27: 0x00000001002a87ec libpython3.9.dylib`pymain_main + 348
frame #28: 0x00000001002a883b libpython3.9.dylib`Py_BytesMain + 43
frame #29: 0x00007fff20394621 libdyld.dylib`start + 1
frame #30: 0x00007fff20394621 libdyld.dylib`start + 1
Thanks Volker for the precise log.
Unfortunately, I neither understand the problem nor can reproduce it. I could just disable OpenMP with clang by default and hope this solves the problem.
I think I tracked it down. I assumed that
cdef bint do_f_vector
is initialized to zero, which apparently isn't always the case.
Then Volkers report also makes sense.
New commits:
bfb4efb | Merge branch 'u/mkoeppe/first_parallel_version_of_face_iterator_reb2' of git://trac.sagemath.org/sage into u/mkoeppe/first_parallel_version_of_face_iterator_reb3 |
4c0a4ae | initialize do_f_vector |
Changed dependencies from #31499 to none
I really don't understand why this pops out just now and not earlier.
Uninitialized variables are whatever the RAM content is when they enter scope. However, any RAM region that you get from the OS is zeroed out (so you can't read content from older processes). So uninitialized variables tend to be zero initially, but the longer the program runs the more likely it becomes that the variable occupies a previously-used memory location.
More precisely, only global and static C variables are guaranteed to be initialized to zero, local variables are not.
Valgrind can detect these things for you (i.e. when you read uninitialized memory)
Replying to @vbraun:
Uninitialized variables are whatever the RAM content is when they enter scope. However, any RAM region that you get from the OS is zeroed out (so you can't read content from older processes). So uninitialized variables tend to be zero initially, but the longer the program runs the more likely it becomes that the variable occupies a previously-used memory location.
More precisely, only global and static C variables are guaranteed to be initialized to zero, local variables are not.
Valgrind can detect these things for you (i.e. when you read uninitialized memory)
Thank you for the explanation.
Once I found out where the problem lies, I wasn't that confused anymore. I originally thought it had something to do with the current ticket, but it doesn't, it just appears now, probably because it now gets stabely compiled in a slightly different way.
In this case it is actually a mistake I made (which I in theory knew about, when I made it). I never meant to assume anything on an unitialized bint
: There where two cases and depending on the case I would set it to True
and False
and then I added special handling for corner cases and forgot to initialize the bint
.
I didn't know that RAM is zeroed out before the process, but once you mention it, of course this is how it ought to be.
Hopefully this will be the last iteration of this. Sorry for having to redo this a bunch of times Volker; I just haven't been able to locally recreate any of the issues.
Changed branch from u/gh-kliem/first_parallel_version_of_face_iterator_reb3 to 4c0a4ae
This ticket parallelizes the f-vector for polytopes.
Each thread has its private structure with which it does partial jobs. Depending on the parallelization depth, there is one job per face of fixed codimension (usually 1,2 or 3). After everything has finished, the partial f-vectors will be added.
Actually, every face is visited and thus the code could be modified in the future, to explore other properties of faces then just the dimension. The parallelization seems to work well with at least 40 threads (for computations taking long enough such that this pays off, see https://arxiv.org/pdf/1905.01945.pdf).
Also the algorithm does work in other situations (simplicial complex, lattice of flats of a matroid) and this parallel structure could be used for this as well.
On the downside,
sig_on()/sig_off()
doesn't work with with multiple threads and has to be replaced by a simplesig_check()
. Also raising errors in parallel code results in terrible slow down. Hence the errors are replaced by returning the exception value. In case of a bug there won't be any traceback anymore, but at least also no segmenation fault.Before:
After (machine has 4 cores):
CC: @jplab @LaisRast @stumpc5 @tscrim
Component: geometry
Keywords: parallel f-vector
Author: Jonathan Kliem
Branch/Commit:
4c0a4ae
Reviewer: Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/31245