Closed richardalligier closed 2 years ago
I'm also running into segmentation fault with code involving to_bigarray
over multiple iterations...
This is what Valgrind says:
==1681093== Invalid free() / delete / delete[] / realloc()
==1681093== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1681093== by 0x28D764BD: caml_empty_minor_heap (minor_gc.c:413)
==1681093== by 0x28D7692F: caml_gc_dispatch (minor_gc.c:492)
==1681093== by 0x28D76A79: caml_alloc_small_dispatch (minor_gc.c:539)
==1681093== by 0x28D77FA8: caml_alloc_small (alloc.c:68)
==1681093== by 0x28D8B3AB: alloc_custom_gen (custom.c:50)
==1681093== by 0x28D8B5DF: caml_alloc_custom_mem (custom.c:106)
==1681093== by 0x28D8D991: caml_ba_alloc (bigarray.c:116)
==1681093== by 0x28D531B5: bigarray_of_pyarray_wrapper (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x28C080D4: camlNumpy__to_bigarray_420 (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x282D9FD5: camlDune__exe__Ocaml__of_float_numpy_1959 (ocaml.re:82)
==1681093== by 0x282D9AE0: camlDune__exe__Ocaml__request_of_self_1878 (ocaml.re:101)
==1681093== Address 0x1aa861f0 is 0 bytes inside a block of size 72 free'd
==1681093== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1681093== by 0x28D764BD: caml_empty_minor_heap (minor_gc.c:413)
==1681093== by 0x28D7692F: caml_gc_dispatch (minor_gc.c:492)
==1681093== by 0x28D76A79: caml_alloc_small_dispatch (minor_gc.c:539)
==1681093== by 0x28D77FA8: caml_alloc_small (alloc.c:68)
==1681093== by 0x28D8B3AB: alloc_custom_gen (custom.c:50)
==1681093== by 0x28D8B5DF: caml_alloc_custom_mem (custom.c:106)
==1681093== by 0x28D8D991: caml_ba_alloc (bigarray.c:116)
==1681093== by 0x28D531B5: bigarray_of_pyarray_wrapper (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x28C080D4: camlNumpy__to_bigarray_420 (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x282D9FD5: camlDune__exe__Ocaml__of_float_numpy_1959 (ocaml.re:82)
==1681093== by 0x282D9AE0: camlDune__exe__Ocaml__request_of_self_1878 (ocaml.re:101)
==1681093== Block was alloc'd at
==1681093== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1681093== by 0x28D531E0: bigarray_of_pyarray_wrapper (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x28C080D4: camlNumpy__to_bigarray_420 (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x282D9FD5: camlDune__exe__Ocaml__of_float_numpy_1959 (ocaml.re:82)
==1681093== by 0x282D9AAD: camlDune__exe__Ocaml__request_of_self_1878 (ocaml.re:97)
==1681093== by 0x282DA068: camlDune__exe__Ocaml__init_2371 (ocaml.re:156)
==1681093== by 0x282DA19E: camlDune__exe__Ocaml__fun_3653 (ocaml.re:184)
==1681093== by 0x28C172F2: camlPy__handle_errors_3584 (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x28D91BFB: caml_start_program (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
==1681093== by 0x28D879F3: caml_callback_exn (callback.c:111)
==1681093== by 0x28D87C6C: caml_callback (callback.c:165)
==1681093== by 0x28D56421: pycall_callback (in /home/kolkhovskiy/algotrading/_build/default/python/ocaml.so)
I'm using Python 3.8.5, OCaml 4.12.0 and pyml 20210226.
Interestingly enough it still segfaults after I do opam install pyml=20200518
...
I can reproduce this bug on ocaml-variants.4.11.1+flambda and pyml 20210226
==22208== Memcheck, a memory error detector
==22208== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22208== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==22208== Command: _build/default/segfault.exe
==22208==
==22208== Invalid write of size 8
==22208== at 0x553B021: ??? (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x57CA4D: caml_empty_minor_heap (minor_gc.c:409)
==22208== by 0x57CE7B: caml_gc_dispatch (minor_gc.c:475)
==22208== by 0x57CFF4: caml_alloc_small_dispatch (minor_gc.c:531)
==22208== by 0x57E5A0: caml_alloc_small (alloc.c:68)
==22208== by 0x590E26: alloc_custom_gen (custom.c:49)
==22208== by 0x59325B: caml_ba_alloc (bigarray.c:116)
==22208== by 0x55D798: bigarray_of_pyarray_wrapper (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== by 0x4E2C26: camlNumpy__to_bigarray_250 (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== by 0x4E275F: camlDune__exe__Segfault__entry (segfault.ml:8)
==22208== by 0x4E04B8: caml_program (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== by 0x596ABF: caml_start_program (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22208==
==22208== Invalid free() / delete / delete[] / realloc()
==22208== at 0x48430E4: free (vg_replace_malloc.c:755)
==22208== by 0x5490F21: ??? (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x553B0E0: ??? (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x57CA4D: caml_empty_minor_heap (minor_gc.c:409)
==22208== by 0x57CE7B: caml_gc_dispatch (minor_gc.c:475)
==22208== by 0x57CFF4: caml_alloc_small_dispatch (minor_gc.c:531)
==22208== by 0x57E5A0: caml_alloc_small (alloc.c:68)
==22208== by 0x590E26: alloc_custom_gen (custom.c:49)
==22208== by 0x59325B: caml_ba_alloc (bigarray.c:116)
==22208== by 0x55D798: bigarray_of_pyarray_wrapper (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== by 0x4E2C26: camlNumpy__to_bigarray_250 (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== by 0x4E275F: camlDune__exe__Segfault__entry (segfault.ml:8)
==22208== Address 0x12d10180 is 48 bytes inside a block of size 4,345 alloc'd
==22208== at 0x484086F: malloc (vg_replace_malloc.c:380)
==22208== by 0x548D91E: PyObject_Malloc (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x548EE25: PyUnicode_New (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x54BE55D: PyUnicode_Substring (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x54B3D62: ??? (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x549EB50: _PyEval_EvalFrameDefault (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x549D524: ??? (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x54AB28D: _PyFunction_Vectorcall (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x549E8EA: _PyEval_EvalFrameDefault (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x549D524: ??? (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x5519ED4: _PyEval_EvalCodeWithName (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x5519E6C: PyEval_EvalCodeEx (in /usr/lib64/libpython3.9.so.1.0)
==22208==
==22208== Invalid free() / delete / delete[] / realloc()
==22208== at 0x48430E4: free (vg_replace_malloc.c:755)
==22208== by 0x5490F21: ??? (in /usr/lib64/libpython3.9.so.1.0)
==22208== by 0x57CA4D: caml_empty_minor_heap (minor_gc.c:409)
==22208== by 0x57CE7B: caml_gc_dispatch (minor_gc.c:475)
==22208== by 0x57CFF4: caml_alloc_small_dispatch (minor_gc.c:531)
==22208== by 0x57E5A0: caml_alloc_small (alloc.c:68)
==22208== by 0x590E26: alloc_custom_gen (custom.c:49)
==22208== by 0x59325B: caml_ba_alloc (bigarray.c:116)
==22208== by 0x55D798: bigarray_of_pyarray_wrapper (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== by 0x4E2C26: camlNumpy__to_bigarray_250 (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== by 0x4E275F: camlDune__exe__Segfault__entry (segfault.ml:8)
==22208== by 0x4E04B8: caml_program (in /home/patrik/devel/consensus-protocol-research/segfault/_build/default/segfault.exe)
==22208== Address 0x14877a10 is in the Data segment of /usr/lib64/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
==22208==
==22208==
==22208== HEAP SUMMARY:
==22208== in use at exit: 18,692,068 bytes in 100,819 blocks
==22208== total heap usage: 426,333 allocs, 325,516 frees, 70,141,136 bytes allocated
==22208==
==22208== LEAK SUMMARY:
==22208== definitely lost: 2,150 bytes in 22 blocks
==22208== indirectly lost: 520 bytes in 8 blocks
==22208== possibly lost: 4,484,877 bytes in 31,107 blocks
==22208== still reachable: 14,204,521 bytes in 69,682 blocks
==22208== of which reachable via heuristic:
==22208== newarray : 432 bytes in 27 blocks
==22208== suppressed: 0 bytes in 0 blocks
==22208== Rerun with --leak-check=full to see details of leaked memory
==22208==
==22208== For lists of detected and suppressed errors, rerun with: -s
==22208== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
After opam install pyml=20200518
and recompilation, the program does not segfault any more.
Not sure whether it's useful, but in my application I get a different report. Invalid read instead of invalid write/free.
==12390== Memcheck, a memory error detector
==12390== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12390== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==12390== Command: /home/patrik/devel/consensus-protocol-research/_venv/bin/pytest gym/tests/test_specs.py
==12390==
=========================================================== test session starts ============================================================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/patrik/devel/consensus-protocol-research/python/gym
plugins: forked-1.3.0
collected 4 items
gym/tests/test_specs.py ..==12390== Invalid read of size 8
==12390== at 0x4A1F019: ??? (in /usr/lib64/libpython3.9.so.1.0)
==12390== by 0x23942F9D: caml_empty_minor_heap (minor_gc.c:409)
==12390== by 0x23943384: caml_gc_dispatch (minor_gc.c:475)
==12390== by 0x23943501: caml_alloc_small_dispatch (minor_gc.c:531)
==12390== by 0x2395BEB4: caml_call_gc (in /home/patrik/devel/consensus-protocol-research/python/gym/cpr_gym/bridge.so)
==12390== by 0x238C611D: camlStdlib__list__find_1125 (list.ml:58)
==12390== by 0x23844DC1: camlCpr_lib__Dag__anon_fn$5bdag$2eml$3a174$2c20$2d$2d192$5d_989 (dag.ml:176)
==12390== by 0x238C3D77: camlStdlib__option__map_104 (option.ml:24)
==12390== by 0x238C3C15: camlStdlib__seq__unfold_256 (seq.ml:83)
==12390== by 0x238C3841: camlStdlib__seq__filter_map_104 (seq.ml:39)
==12390== by 0x236278F8: camlDune__exe__Definitions__iter_374 (definitions.ml:145)
==12390== by 0x23627438: camlDune__exe__Definitions__step_248 (definitions.ml:158)
==12390== Address 0x8 is not stack'd, malloc'd or (recently) free'd
==12390==
==12390==
==12390== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==12390== Access not within mapped region at address 0x8
==12390== at 0x4A1F019: ??? (in /usr/lib64/libpython3.9.so.1.0)
==12390== by 0x23942F9D: caml_empty_minor_heap (minor_gc.c:409)
==12390== by 0x23943384: caml_gc_dispatch (minor_gc.c:475)
==12390== by 0x23943501: caml_alloc_small_dispatch (minor_gc.c:531)
==12390== by 0x2395BEB4: caml_call_gc (in /home/patrik/devel/consensus-protocol-research/python/gym/cpr_gym/bridge.so)
==12390== by 0x238C611D: camlStdlib__list__find_1125 (list.ml:58)
==12390== by 0x23844DC1: camlCpr_lib__Dag__anon_fn$5bdag$2eml$3a174$2c20$2d$2d192$5d_989 (dag.ml:176)
==12390== by 0x238C3D77: camlStdlib__option__map_104 (option.ml:24)
==12390== by 0x238C3C15: camlStdlib__seq__unfold_256 (seq.ml:83)
==12390== by 0x238C3841: camlStdlib__seq__filter_map_104 (seq.ml:39)
==12390== by 0x236278F8: camlDune__exe__Definitions__iter_374 (definitions.ml:145)
==12390== by 0x23627438: camlDune__exe__Definitions__step_248 (definitions.ml:158)
==12390== If you believe this happened as a result of a stack
==12390== overflow in your program's main thread (unlikely but
==12390== possible), you can try to increase the size of the
==12390== main thread stack using the --main-stacksize= flag.
==12390== The main thread stack size used in this run was 8388608.
==12390==
==12390== Process terminating with default action of signal 11 (SIGSEGV)
==12390== General Protection Fault
==12390== at 0x4DACC82: __pthread_once_slow (in /usr/lib64/libpthread-2.33.so)
==12390== by 0x4CFD03E: __rpc_thread_variables.part.0 (in /usr/lib64/libc-2.33.so)
==12390== by 0x4D3F61C: free_mem (in /usr/lib64/libc-2.33.so)
==12390== by 0x4D3F271: __libc_freeres (in /usr/lib64/libc-2.33.so)
==12390== by 0x48351E7: _vgnU_freeres (vg_preloaded.c:74)
==12390==
==12390== HEAP SUMMARY:
==12390== in use at exit: 28,276,210 bytes in 172,014 blocks
==12390== total heap usage: 859,857 allocs, 687,843 frees, 153,403,535 bytes allocated
==12390==
==12390== LEAK SUMMARY:
==12390== definitely lost: 2,296 bytes in 21 blocks
==12390== indirectly lost: 520 bytes in 8 blocks
==12390== possibly lost: 7,156,427 bytes in 58,974 blocks
==12390== still reachable: 21,116,967 bytes in 113,011 blocks
==12390== of which reachable via heuristic:
==12390== newarray : 512 bytes in 32 blocks
==12390== suppressed: 0 bytes in 0 blocks
==12390== Rerun with --leak-check=full to see details of leaked memory
==12390==
==12390== For lists of detected and suppressed errors, rerun with: -s
==12390== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
[1] 12390 segmentation fault (core dumped) valgrind pytest gym/tests/test_specs.py
Very sorry for the very (very too!) late answer... but this should be fixed now! Thank you very much for your report, that helped a lot for bisecting. This issue was more general than just to_bigarray
(the Numpy array type object was stolen from Python by the OCaml GC when accessed), and the fix should solve other instabilities linked with Numpy as well.
Hello,
On the Python side I have
pythonmodule.py
:and on OCaml side I have:
I have a segfault around the 500th iteration with 20210226 but no segfault at all just using
opam install pyml=20200518
Best, Richard