Closed etijskens closed 4 years ago
Can you import numpy.core.multiarray
from the interactive shell? It looks like the answer is "no", at which point, this isn't a pybind issue.
Apparently, the python code does not crash when numpy.core.multiarray is imported (or even just numpy) before it tries to expose the internal C++ array. This is weird, as the small test project's test code does not import numpy:
the result of print(sys.path)
is the same wether or not numpy was imported in the python code or not
import pyfoam.npytst as npytst
if __name__=="__main__":
c = npytst.C(rank,np)
c.print()
a = c.get_array()
print(f"[{rank}]a =",a)
a[0] = 100.1
a[1] = 100.1 + a[1]
a[2] += 100.1
print(f"[{rank}]a =",a)
c.print()
print("-*# done #*-")
Thanks for considering my question.
Can you import numpy.core.multiarray from the interactive shell? It looks like the answer is "no", at which point, this isn't a pybind issue.
That was of course the first thing I tried when I found out where it went wrong. The answer is ‘yes’.
I just found out that if I execute the import in the python code before it attempts to execute the construction of the numpy array the crash does not happen. This is weird because the small test project doesn’t need to do so.
Kindest regards,
Dr [Engel]bert Tijskens - HPC Analyst/Consultant Flemish Supercomputer Center University of Antwerp / Computational mathematics ++32 3 265 3879, ++32 494 664408
Just importing numpy before seems to be ok too. If I do not import it the code crashes when it tries to create the numpy array in the pybind11 module
That is weird. Try wrapping your code in try{}catch(py::error_already_set& e) {}
and see what the actual exception contains. There has to be a reason why py::module::import("numpy.core.multiarray")
fails. It's basically doing the same thing as import numpy.core.mutliarray
- PyModule_ModuleImport("numpy.core.mutiarray");
.
Are we sure it's py::module::import("numpy.core.multiarray")
and not py::module::import("numpy").attr("core").attr("multiarray")
?
Yes, it fails on
static npy_api lookup() {
module m = module::import("numpy.core.multiarray");
I tried this:
py::array_t<double>
getLocalArray()
{
py::str dummydataowner;
try{
return py::array_t<double>{this->localArray_.size(), this->localArray_.data(), dummydataowner};
}catch(py::error_already_set& e) {
Pout<<e.what()<<endl;
}
}
But the exception isn’t caught.
I have been digging a little deeper: in pybind11/pybind11.h
Class module
{
…
static module import(const char name) {
PyObject obj = PyImport_ImportModule(name);
if (!obj)
throw error_already_set();
return reinterpret_steal
The import function does not get beyond the PyImport_ImportModule(name)
call. It segfaults.
The exception is not thrown.
The import function does not get beyond the PyImport_ImportModule(name) call. It segfaults.
That indicates that you're not holding the gil when accessing CPython API.
How should I remedy that?
This is now way beyond the minimal example that you've shown us in the beginning. Without a lot more context I can only guess, but you've definitely released GIL at some point, either through raw C API or through pybind11 utilities.
https://pybind11.readthedocs.io/en/stable/advanced/misc.html#global-interpreter-lock-gil
If I import numpy first in the python code, the code no longer segfaults, and works fine. Does that fit with the idea that I released the GIL somewhere?
I certainly did not do so intentionally. Is there a way I could have done by accident?
Well, threads are not the easiest thing in the world. A spawned thread doesn't "inherit" the GIL. Again, I don't have your code to look into.
If I import numpy first in the python code, the code no longer segfaults, and works fine. Does that fit with the idea that I released the GIL somewhere?
Possibly. Again, threads don't inherit GIL. GIL is just a glorified mutex.
Where is the
PyImport_ImportModule
code to be found, so I can dig even deeper?
I really appreciate your help, Is there a way to verify that the gil is indeed the problem? E.g. can I verify that the gil is released before constructing the py::array_t?
That test returns 1, so The GIL is definitively held. There must be sth else wrong
Check this as well. https://docs.python.org/3/c-api/init.html#c.Py_IsInitialized
@etijskens We're failing to reproduce this, so it's almost impossible to debug. Is there something else we can try to reproduce? Is there a way you can produce an example that fails (unlike the example code in your original post)?
Py_IsInitialized() returns true as well
I understand that, but for the moment I do not have a minimal failing example - I will let you know when I have. I will try to check some things first
Hopefully helpful note (you might already know this, though): it's possible to run Python in gdb
, run your script, and get a stack trace when things come crashing down inside your extension module.
Things become still weirder… I printed out the path as returned by Py_GetPath. Both the working minimal version and the failing OpenFOAM wrapper return the same path:
Py_GetPath=/home/bert/.pyenv/versions/3.8.2/lib/python38.zip:/home/bert/.pyenv/versions/3.8.2/lib/python3.8:/home/bert/.pyenv/versions/3.8.2/lib/python3.8/lib-dynload
But this is the Python determined by pyenv, not the one from my virtual environment. The virtual environment has numpy installed, the pyenv version not. The virtual environment uses the pyenv python, but via a soft link.
Maybe this rings a bell...
@etijskens And this does not match the sys.path
from your interactive terminal? I'd start hunting for the moment at which this changes, in that case.
I also printed sys.path: sys.path= ['/home/bert/workspace/pyFoam', '/home/bert/.pyenv/versions/3.8.2/lib/python38.zip', '/home/bert/.pyenv/versions/3.8.2/lib/python3.8', '/home/bert/.pyenv/versions/3.8.2/lib/python3.8/lib-dynload', '/home/bert/workspace/pyFoam/.venv/lib/python3.8/site-packages', '/home/bert/workspace/pyFoam'] This is more plausible, as it contains the site-packages of the virtual environment .venv.
However it remains puzzling that there is no difference between both applications, yet one crashes and the other doesn’t. Obviously, there must be a difference. It is just that I haven’t unearthed it yet Again, if I import numpy first, the not working application no longer crashes, which is also puzzling.
However it remains puzzling that there is no difference between both applications, yet one crashes and the other doesn’t.
Well, in one case, something is removing stuff from sys.path
or at least changing it. So there has to be a difference somewhere. Do you have a non-minimal example we can use to reproduce, then?
I uninstalled numpy from the virtual environment and ran the crashing application again. It fails, but with an exception, rather than a core dump:
Traceback (most recent call last):
File "cases/pitzDaily-small-pysimplefoam-np2/solve.py", line 27, in
This indicates that something else goes wrong, rather then a path issue. If there would have been a path issue it would fail in the same way as above.
Interesting, but now I'm certain that this is environmental and not a pybind11 bug. I don't see how we can help without a way to repro...
You are probably right, I just found out that passing the command line arguments to the OpenFOAM part is enough to make the application crash. Thanks for your support, without your help I wouldn’t have gotten that far…
I just found out that passing the command line arguments to the OpenFOAM part is enough to make the application crash.
Ugh, sounds annoying. Good luck figuring things out, further, then!
Can we close this issue? You can always reopen (or just add a comment) if new issues (or the same old issues) turn up again.
That is fine - if I find a solution, I certainly post it. Many thanks
if I find a solution, I certainly post it
Yes, please do! Thanks :-)
I solved the problem, finally. The trick is to construct OpenFOAM's argList object as args(argc,argv,true,true,/*initialise=*/false)
. It is great that OpenFOAM has this feature, although it is largely undocumented.
Huh, OK. But the good news then is that pybind11 is not to blame :-)
That is good news indeed. I like it a lot. I already liked boost::python despite its dependency on boost and a specific python version. What I like even more is the quick support when a problem arises. Many thanks again.
I have large C++ class (wrapping parts of OpenFOAM). the class must expose some internal C++ array as a numpy array. I managed to compile a small Test class (code below) just to experiment, taking advice from several posts I have read. This works fine. However, when I copy the code without change in the large class the code crashes when it tries to construct the
py::array_t<T>
object. After quite some debugging I found out that it crashes in pybind11/numpy.h instatic npy_api lookup()
at the statementmodule m = module::import("numpy.core.multiarray");
As far as I know the small test module and the large OpenFOOAM wrapper are built in exactly the same way, but I am probably overlooking sth. the python test code is both executed in the same virtual python environment, where numpy is available.Any clues?
Here is the small test code
EDIT (@YannickJadoul): Using ``` for multiline code.