Open tysg opened 1 year ago
I think I'm seeing the same issue. I haven't come up with a minimal reproducer yet however.
The backtrace is different but the core is similar: caml_gc_dispatch -> caml_empty_minor_heap -> dict_dealloc -> _PyInterpreterState_GET
, and crash.
I can't think of a work-around at the moment and I'm not sure there's a good way to fix the issue in pyml either. (heh, or wait until we have a python with no gil! ;p )
I tried to migrate to ocaml 5, hoping the collections would happen per-thread but that didn't help. It seems that if I have a thread dedicated to python operations and that I Gc.compact ()
frequently from it, the crashes occur less often. They still occur far too often in practice unfortunately.
I tried to change pydecref()
in pyml_stubs.c
to surround the actual Py_DECREF()
with Python_PyGILState_{Ensure,Release}()
and unsurprisingly ended up with a deadlock instead, again triggered by the GC. I would have hoped that the OCaml 5 GC would collect values from the same thread they were allocated from but I guess it does not (or maybe not for custom values?): that should at least make it possible to conduct all python operations from a single thread.
There's actually a work-around with OCaml 5 it seems: dedicate a domain for the python execution and do everything you can there. That will help for values allocated on the minor heap since they'll be collected from the same domain (and therefore, probably the same OS thread). Therefore you'll want to avoid values being promoted or allocated directly on the major heap and will have to trigger the GC yourself frequently enough (while making sure values can be collected), and maybe also tweak Gc.{custom_minor_ratio,custom_minor_max_size,minor_heap_size}
.
Of course this is only working around the issue but it seems to work well enough in my case and hopefully there will be GIL-free python builds widely available in the coming months.
First of all, thank you for the amazing package! We had this error, detailed below:
Error output:
gdb backtrace:
From what I can see, OCaml Gc tried to reclaim memory holding by Python, without holding the GIL. Unfortunately I cannot provide a minimal reproduction, but I can continue to monitor this and report more findings if I have them.