Open amal-khailtash opened 2 weeks ago
Yes, would be the GIL. If you start the whole machinery from cppyy
, then that top-level method should release the GIL, by setting the __release_gil__
flag of the relevant method (see: https://cppyy.readthedocs.io/en/latest/misc.html). Contrary to most binders, cppyy
defaults to NOT releasing the GIL, whereas most others do release the GIL when entering C/C++.
With the GIL released, C++ threads can then call the Python callbacks (the generated wrapper that marshalls the function arguments and the return value will automatically re-acquire the GIL). There being only one GIL, all Python code will still serialize; that is, you can only run one Python callback at any time, starting any others will simply block them for the duration of the current one.
There are couple of examples in the test suite: https://github.com/wlav/cppyy/blob/master/test/test_concurrent.py
Aside, to run Python on different C++ threads in parallel, sub-interpreters are needed, but those are not supported yet. Is a work-in-progress.
That is great information I was missing. Thank you!
I added the __release_gil__ = True
to the callback function, and added this inside the function:
if ( PyGILState_Check() ) { printf("GIL is held by the current thread.\n"); }
else { printf("GIL is not held by the current thread.\n"); }
And I see: GIL is held by the current thread.
So, it seems the GIL is still held inside this function?
Does this mean that in my use case, I still need to wait for sub-interpreters implementation? I do not mind if things are done serially at the moment, for I am more interested in functionality and not full performance yet.
Inside this C++ callback method, I added this at the beginning of the callback.
void callback()
{
if ( PyGILState_Check() ) { printf("A - GIL is held\n"); }
else { printf("A - GIL is NOT held\n"); }
Py_BEGIN_ALLOW_THREADS
if ( PyGILState_Check() ) { printf("B - GIL is held\n"); }
else { printf("B - GIL is NOT held\n"); }
// Your C code that doesn't need the GIL
std::cout << " Stored functions count: " << this->processes.size() << std::endl;
Py_END_ALLOW_THREADS
}
And I see:
A - GIL is held
B - GIL is NOT held
It seems GIL it is still held at the start, but inside Py_BEGIN_ALLOW_THREADS
/Py_END_ALLOW_THREADS
, it is released. Program still hangs when accessing this->processes
(std::map
) with or without Py_BEGIN_ALLOW_THREADS
/Py_END_ALLOW_THREADS
.
The word "callback" is doing too much work (you have them internally to your C++ threaded calls, but now it seems to be a top-level one, or maybe not). Let's talk in code, makes it easier. I believe that the following is what we're talking about:
import cppyy
cppyy.cppdef(r"""\
#include "Python.h"
std::map<std::string, std::function<void()>> the_callbacks;
void does_it_have_the_GIL(const char* name) {
if ( PyGILState_Check() ) { printf("%s - GIL is held\n", name); }
else { printf("%s - GIL is NOT held\n", name); }
}
void run_callback(const std::string& name) {
does_it_have_the_GIL("run_callback");
auto c = the_callbacks.find(name);
if (c != the_callbacks.end())
return c->second();
std::cerr << "function " << name << " not registered" << std::endl;
}""")
def callme():
cppyy.gbl.does_it_have_the_GIL("callme")
cppyy.gbl.the_callbacks[callme.__name__] = callme
cppyy.gbl.run_callback.__release_gil__ = True
cppyy.gbl.run_callback(callme.__name__)
The above shows how the function that is going to run the callbacks (run_callback
), which is the entry point from Python into C++, needs to release the GIL, and the printout shows that it does. Then running the actual Python callbacks from C++ (which in your case will be on some thread), will re-acquire the GIL, as they have to. The printout shows that it does. If said Python callback calls back into C++ again, then yes, you need to release the GIL again as appropriate, either by setting __release_gil__
on the functions in Python that the Python callback calls, or by using the C-API as you do in the example above once you enter the C++ code.
Let's continue editing this example code until convergence if there are further questions.
Now, if you want to run those callbacks into Python into true parallel mode, then yes, they need to run, from C++ I'd guess or maybe I can figure out some higher-level support, in a Python sub-interpreter. That's a WIP (it's a big chunk of work, but I need it myself as it's critical for a project I'm currently working on).
Yes, I have similar code as you specified. The registration done from Python as you have, but the actual call is done from the C++ code and not Python. So instead of cppyy.gbl.run_callback(callme.__name__)
, C++ multithreaded code calls this run_callback
. I see:
run_callback - GIL is held
I thought with cppyy.gbl.run_callback.__release_gil__ = True
, GIL should have been released?
And it hangs accessing the_callbacks map the_callbacks.find(name);
. I have a print after that and that is not printed.
Not only should ... I have in fact, running the code above:
run_callback - GIL is NOT held
callme - GIL is held
But you say "C++ multithreaded code calls this run_callback", so you're probably doing something different, which goes back to the point of talking in code as opposed to in prose, but are you calling that function directly from C++ or are you calling that function through cppyy
either from the interpreter or with (for example) the Python C-API? If the former, than obviously, there's no GIL release by only setting it on run_callback
: it's whatever is the entry point is from Python into C++ that needs to release the GIL.
I will try to get create a small example, it is difficult to show full code for this library. Python creates the functions to be called by C++ library. It will register them in the map. The C++ library calls the run_callback directly, looks up the Python function to call, and calls it.
The C++ library calls the run_callback directly
Right, so if the function does not originate from Python, it's C++ that needs to release the GIL. When calling C++ methods from C++ methods, Python-side features such as __release_gil__
are not in play.
Should releasing the GIL happen before the C++ library/thread calls the callback function? My experience is that I cannot release GIL inside the callback function and that seems to be the cause of the hang/deadlock.
Even before that. The GIL is held by the main thread (I'm still not sure based on your description whether that's a Python or C++ one). It needs to be released before another thread calls the callback. That thread calling the callback shouldn't itself release the GIL until it held it.
I think that is where my problem lies. I have no control over the C++ library I am binding to and I do not want to go in and modify it if possible. Not sure what options I have here.
Well, I still don't understand what you are doing. Maybe one final attempt: first, is this a C++ executable with an embedded Python interpreter, or is the application run from the Python interpreter?
This is an external C++ library that is started from Python. The C++ library is multithreaded and defines different classes. I intend to expose the classes and most of the functionality to Python. Basically Python runs/initiates the C++ code.
Would the sub-interpreters or the no GIL version of Python 3.13 help?
So how is it started from Python, through cppyy? If yes, then whatever is the top-level function (clearly not the equivalent of run_callback
) is what needs to have __release_gil__
set. Then the main thread has no GIL.
As for sub-interpreters, that's about running multiple Python codes in parallel from e.g. C++. They still have a GIL, just one each. You are still going to need to release the main thread one first.
Great suggestion. I added __release_gil__ to the top-level C++ function that is the entry point for the C++ library. And in the callback function I now see: [call_process] - GIL is NOT held
. Still hangs! One note is that the calbacks for C++ threads are registered from Python, but spawned from C++ that need to call the call_process. I am still missing and don't quite understand where else I need to release or acquire GIL.
Okay, next I need to understand how these callbacks are created. Are these Python functions or JITed C++ functions? And then, how are they passed from Python into C++?
If there is no Python involved in the callbacks, they do not need to hold the GIL. If there is, then they do. But e.g. JITed C++ functions that are passed through Python back into C++ do not require the GIL. To illustrate, below is an extension on the example above, executing Python function and C++ ones, respectively, from threads in C++: the former hold the GIL, the latter don't (this is all automatic). Try the use of C++ JITing to generate callbacks for use by your library. If those still hang, then the issue is not the GIL. If they don't hang, then it most likely still is somehow and to say something sensible in that scenario, I'd need to know how they are passed from Python into C++ and how they are stored in C++?
import cppyy
import textwrap
cppyy.cppdef(r"""\
#include "Python.h"
std::map<std::string, std::function<void()>> the_callbacks;
void does_it_have_the_GIL(const char* name) {
if ( PyGILState_Check() ) { printf("%s - GIL is held\n", name); }
else { printf("%s - GIL is NOT held\n", name); }
}
void run_callbacks() {
does_it_have_the_GIL("run_callback");
std::vector<std::thread> workers;
for (auto& c: the_callbacks)
workers.push_back(std::thread(c.second));
for (auto& t: workers)
t.join();
}""")
cppyy.gbl.run_callbacks.__release_gil__ = True
# call python functions
cppyy.gbl.the_callbacks.clear()
def py_fgen(i):
name = f"py_callme{i}"
exec(textwrap.dedent(f"""\
def {name}():
cppyy.gbl.does_it_have_the_GIL("{name}")
"""), globals())
return eval(name)
for i in range(10):
f = py_fgen(i)
cppyy.gbl.the_callbacks[f.__name__] = f
cppyy.gbl.run_callbacks()
# call C++ functions
cppyy.gbl.the_callbacks.clear()
def cpp_fgen(i):
name = f"cpp_callme{i}"
cppyy.cppdef(textwrap.dedent(f"""\
void {name}() {{
does_it_have_the_GIL("{name}");
}}
"""))
return getattr(cppyy.gbl, name)
for i in range(10):
f = cpp_fgen(i)
cppyy.gbl.the_callbacks[f.__name__] = f
cppyy.gbl.run_callbacks()
I pushed my code to github. You can clone, and do
$ git clone https://github.com/amal-khailtash/pysystemc
$ make setup-systemc
$ make uv
$ make venv
$ source .venv/bin/activate
$ make uv-sync
This will clone, get SystemC submodule, checkout v2.3.4, and build SystemC library. This is a H/W language similiar to Verilog/VHDL and it is a multithreaded simulation kernel.
Normally the kernel expects an external sc_main()
function that is linked when building a SystemC simulation. I overloaded this sc_main here: src/amal/eda/systemc/sc_core/sc_main.cpp
. This acquires GIL, and calls a sc_main
counterpart in Python instead and releases GIL after that.
The sc_main
is responsible for instantiating a design sc_core.sc_module
that I have overloaded as well src/amal/eda/systemc/sc_core/sc_module_new.cpp
. This is to provide the callback capability.
Once sc_main starts, and instantiates modules, where modules define the design using ports, and signals and processes (methods, (c)threads in SystemC land). There are a few standard fixed callbacks before_end_of_elaboration
, end_of_elaboration
, start_of_simulation
, and end_of_simulation
that are called when the simulation starts.
sc_main
can use sc_start
to start simulations. During definition of sc_modules, new processes can be registered with the kernel using what I named sc_method
, sc_thread
and sc_cthread
in src/amal/eda/systemc/sc_core/sc_module_new.cpp
.
This is where I put the function callback name/function pointer into a std::map. I could also pass self
to this class for easy access to Python counterpart class members. But that same lock/hang gave me grief.
Once simulation starts with sc_start
, first the standard fixed callbacks are call during the simulation loop. Then any registered method/thread need to be called based on the defined triggers (sensitivity).
There are some examples of pure SystemC designs that you can run using the provided Python script that just cppdefs
the files and runs C++ sc_main
. If do type make
alone, you will see help and examples:
Examples:
=========
example-hello-cpp - Run the hello world example in C++
example-hello-py - Run the hello world example in Python
example-counter-cpp - Run the counter example in C++
example-counter-py - Run the counter example in Python
example-simple_fifo-py - Run the simple_fifo example in Python
example-pipe-cpp - Run the pipe example in C++
example-pipe-py - Run the pipe example in Python
example-pkt_switch-cpp - Run the pkt_switch example in C++
You can use hello-cpp
and hello-py
to see the problem. Right now, I have commented the callback lookup and the actual call in src/amal/eda/systemc/sc_core/sc_module_new.cpp
, so both C++ and Python example should run correctly. If you uncomment src/amal/eda/systemc/sc_core/sc_module_new.cpp
line 136 which I try to access the processes (map) size, it locks.
Your help and insight is much appreciated and I find cppyy
awesome for the fact that it seamlessly supports C++ and templates.
I can look in more detail later, but just from looking over sc_main
: that's not the way to embed Python. Here's an example to setup Python to run embedded from C++: https://github.com/wlav/CPyCppyy/blob/master/src/API.cxx#L39 .
At least, I can't find anywhere in your or the systemc code where Python is being initialized. Recommend calling Py_IsInitialized()
in sc_main
to see whether it is. If not, then that might be the problem right there.
I didn't believe I need to call py_initialize! I am not embedding python in C++, but the other way around. Unless I am missing something. Python will start a python application where it accesses the exposed C++ library.
Ah, so the main executable is the python interpreter itself? That starts systemc, which runs sc_main? Yes, that should definitely be fine that way.
Yes, here is how things start:
systemc
and loads libsystemc.so
before_end_of_elaboration
, end_of_elaboration
, start_of_simulation
, and end_of_simulation
ScModule
inherits from the original C++ sc_core.sc_module, but it will replace it on the Python side.sc_method
and sc_thread
implemented in ScModule class.sc_main
, sc_start
is called which starts the SystemC simulation kernel.I hope that is clear. I need to draw some diagram for all these process.
I am working with a multithreaded C++ library that I am trying to bind to. I have a written a new C++ class that extends one of the original C++ library class. A Python class extends this class and defines a few callbacks that should be called by the C++ library.
The callbacks registration is initiated from Python that passes a Python method name to be called. This method is passed to the extended C++ class where it keeps a std::map of Python method name and function pointer. It also registers a dispatcher method in the extended C++ class that will eventually be calling the Python method by looking its name in the map and calling the Python method.
In this dispatcher function, as soon as I try to access the std::map or any variables outside of this function, the process hangs. I am familiar with GIL a bit, but I think this is somehow related to the concurrent multi-threaded aspect of this library. I tried GIL acquire/release, mutex, and also interpreter swap (sub-interpreters). But I do not quite understand the problem and what causes is or how to debug it.