thierry-martinez / pyml

OCaml bindings for Python
BSD 2-Clause "Simplified" License
182 stars 31 forks source link

Using an OCaml library from multithreaded Python code #76

Open jonathan-laurent opened 2 years ago

jonathan-laurent commented 2 years ago

Thanks for developing and maintaining this great package!

I am currently using pyml and pythonlib to call an OCaml library from Python. My Python application is using multiple threads (using concurrent.futures.ThreadPool), although these threads never run in parallel (thanks to the GIL).

So far, things have been working fine for me. However, I found this in the OCaml documentation: https://ocaml.org/manual/intfc.html#ss:c-thread-register

Callbacks from C to OCaml are possible only if the calling thread is known to the OCaml run-time system. Threads created from OCaml (through the Thread.create function of the system threads library) are automatically known to the run-time system. If the application creates additional threads from C and wishes to callback into OCaml code from these threads, it must first register them with the run-time system. The following functions are declared in the include file <caml/threads.h>.: [caml_c_thread_register() and caml_c_thread_unregister()].

Does pyml need to do anything special to deal with these? My application seems to be doing fine right now but I am worried about subtle bugs or leaks in the future.

@LaurentMazare Do you have any experience calling ocaml libraries from multithreaded python code at Janestreet?

thierry-martinez commented 2 years ago

Sorry for the delay, and thank you very much for the question, I wasn't even aware of the existence of caml_c_thread_register! This function is defined in systhreads so I suppose we can ignore it if we don't use threads (i.e., we do not use threads.cmxa) is the OCaml side. I asked the question in https://discuss.ocaml.org/t/caml-c-thread-register-in-programs-that-do-not-use-systhreads/9232 .

UnixJunkie commented 2 years ago

@jonathan-laurent maybe you can close this issue if it answers your question

LaurentMazare commented 2 years ago

Sorry for the delay, I just noticed me being mentioned here. Most of our pyml use cases rely on multithreaded OCaml (because of using async and its thread pool). We run into issues when this interacts with Python threads, I've tried various things around calling caml_c_thread_register but wasn't able to get this to work properly, this ended up segfaulting when the Python thread runs some OCaml code via pyml. Would certainly be interested if this can work at some point.

jonathan-laurent commented 2 years ago

@LaurentMazare This is interesting, thanks. To be clear though, do you think Async has something to do with the problems you observed or do you think they arise more generally?

LaurentMazare commented 2 years ago

I think it's likely to be a problem with code that links the threads module (async and lwt jump to mind). Based on what we saw, even if the wrapped functions do not use async themselves, linking it was sufficient to result in some segfault when calling the functions from different Python threads.