pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.6k stars 2.09k forks source link

[QUESTION] Crash when importing NumPy manually #2485

Closed etijskens closed 4 years ago

etijskens commented 4 years ago

I have large C++ class (wrapping parts of OpenFOAM). the class must expose some internal C++ array as a numpy array. I managed to compile a small Test class (code below) just to experiment, taking advice from several posts I have read. This works fine. However, when I copy the code without change in the large class the code crashes when it tries to construct the py::array_t<T> object. After quite some debugging I found out that it crashes in pybind11/numpy.h in static npy_api lookup() at the statement module m = module::import("numpy.core.multiarray"); As far as I know the small test module and the large OpenFOOAM wrapper are built in exactly the same way, but I am probably overlooking sth. the python test code is both executed in the same virtual python environment, where numpy is available.

Any clues?

Here is the small test code

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

#include <vector>
#include <iostream>
namespace py = pybind11;

typedef double T;

size_t N=10;

class C
{ 
    std::vector<T> array_;
    int rank_;
    int np_;

  public:
    C()
      : rank_(0), np_(1)
    {
        std::cout<<"constructing C()"<<std::endl;
        this->array_.resize(N);
        for( size_t i; i<N; ++i ) {
            array_[i] = i;
        }
        std::cout<<"constructed\n"<<std::endl;
    }
    C(int rank,int np)
      : rank_(rank)
      , np_(np)
    {
        std::cout<<"constructing C("<<rank_<<','<<np_<<')'<<std::endl;
        this->array_.resize(N);
        for( size_t i; i<N; ++i ) {
            array_[i] = i + 1000*rank;
        }
        std::cout<<"constructed\n"<<std::endl;
    }
    void
    print() {
        for( size_t i; i<N; ++i ) {
            std::cout<<"["<<this->rank_<<'/'<<this->np_<<"] array["<<i<<"] = "<<array_[i]<<std::endl;
        }
    }
    py::array_t<T> get_array() { 
        py::str dummydataowner;
        return py::array_t<T>{N, this->array_.data(), dummydataowner}; 
    }
};

PYBIND11_MODULE(npytst, m)
{// optional module doc-string
    m.doc() = "pybind11 npytst plugin"; // optional module docstring
    py::class_<C>(m, "C")
        .def(py::init<>())
        .def(py::init<int,int>())
        .def("print",&C::print)
        .def("get_array",&C::get_array)
        ;
}

EDIT (@YannickJadoul): Using ``` for multiline code.

bstaletic commented 4 years ago

Can you import numpy.core.multiarray from the interactive shell? It looks like the answer is "no", at which point, this isn't a pybind issue.

etijskens commented 4 years ago

Apparently, the python code does not crash when numpy.core.multiarray is imported (or even just numpy) before it tries to expose the internal C++ array. This is weird, as the small test project's test code does not import numpy:

the result of print(sys.path) is the same wether or not numpy was imported in the python code or not

import pyfoam.npytst as npytst

if __name__=="__main__":
    c = npytst.C(rank,np)
    c.print()
    a = c.get_array()
    print(f"[{rank}]a =",a)
    a[0] = 100.1
    a[1] = 100.1 + a[1]
    a[2] += 100.1
    print(f"[{rank}]a =",a)
    c.print()
    print("-*# done #*-")
etijskens commented 4 years ago

Thanks for considering my question.

Can you import numpy.core.multiarray from the interactive shell? It looks like the answer is "no", at which point, this isn't a pybind issue.

That was of course the first thing I tried when I found out where it went wrong. The answer is ‘yes’.

I just found out that if I execute the import in the python code before it attempts to execute the construction of the numpy array the crash does not happen. This is weird because the small test project doesn’t need to do so.

Kindest regards,

Dr [Engel]bert Tijskens - HPC Analyst/Consultant Flemish Supercomputer Center University of Antwerp / Computational mathematics ++32 3 265 3879, ++32 494 664408

etijskens commented 4 years ago

Just importing numpy before seems to be ok too. If I do not import it the code crashes when it tries to create the numpy array in the pybind11 module

bstaletic commented 4 years ago

That is weird. Try wrapping your code in try{}catch(py::error_already_set& e) {} and see what the actual exception contains. There has to be a reason why py::module::import("numpy.core.multiarray") fails. It's basically doing the same thing as import numpy.core.mutliarray - PyModule_ModuleImport("numpy.core.mutiarray");.

YannickJadoul commented 4 years ago

Are we sure it's py::module::import("numpy.core.multiarray") and not py::module::import("numpy").attr("core").attr("multiarray")?

etijskens commented 4 years ago

Yes, it fails on

static npy_api lookup() {
    module m = module::import("numpy.core.multiarray");
etijskens commented 4 years ago

I tried this:

    py::array_t<double>
    getLocalArray()
    {
        py::str dummydataowner;
        try{
            return py::array_t<double>{this->localArray_.size(), this->localArray_.data(), dummydataowner};
        }catch(py::error_already_set& e) {
        Pout<<e.what()<<endl;
        }
    }

But the exception isn’t caught.

etijskens commented 4 years ago

I have been digging a little deeper: in pybind11/pybind11.h

Class module { … static module import(const char name) { PyObject obj = PyImport_ImportModule(name); if (!obj) throw error_already_set(); return reinterpret_steal(obj); } …

The import function does not get beyond the PyImport_ImportModule(name) call. It segfaults. The exception is not thrown.

bstaletic commented 4 years ago

The import function does not get beyond the PyImport_ImportModule(name) call. It segfaults.

That indicates that you're not holding the gil when accessing CPython API.

etijskens commented 4 years ago

How should I remedy that?

bstaletic commented 4 years ago

This is now way beyond the minimal example that you've shown us in the beginning. Without a lot more context I can only guess, but you've definitely released GIL at some point, either through raw C API or through pybind11 utilities.

https://pybind11.readthedocs.io/en/stable/advanced/misc.html#global-interpreter-lock-gil

etijskens commented 4 years ago
etijskens commented 4 years ago

If I import numpy first in the python code, the code no longer segfaults, and works fine. Does that fit with the idea that I released the GIL somewhere?

bstaletic commented 4 years ago

I certainly did not do so intentionally. Is there a way I could have done by accident?

Well, threads are not the easiest thing in the world. A spawned thread doesn't "inherit" the GIL. Again, I don't have your code to look into.

If I import numpy first in the python code, the code no longer segfaults, and works fine. Does that fit with the idea that I released the GIL somewhere?

Possibly. Again, threads don't inherit GIL. GIL is just a glorified mutex.

Where is the PyImport_ImportModule code to be found, so I can dig even deeper?

https://github.com/python/cpython/blob/45b34a04a577aa49fa4825421758c3e8eaa1625d/Python/import.c#L1474-L1485

etijskens commented 4 years ago

I really appreciate your help, Is there a way to verify that the gil is indeed the problem? E.g. can I verify that the gil is released before constructing the py::array_t?

bstaletic commented 4 years ago

https://docs.python.org/3/c-api/init.html#c.PyGILState_Check

etijskens commented 4 years ago

That test returns 1, so The GIL is definitively held. There must be sth else wrong

bstaletic commented 4 years ago

Check this as well. https://docs.python.org/3/c-api/init.html#c.Py_IsInitialized

YannickJadoul commented 4 years ago

@etijskens We're failing to reproduce this, so it's almost impossible to debug. Is there something else we can try to reproduce? Is there a way you can produce an example that fails (unlike the example code in your original post)?

etijskens commented 4 years ago

Py_IsInitialized() returns true as well

etijskens commented 4 years ago

I understand that, but for the moment I do not have a minimal failing example - I will let you know when I have. I will try to check some things first

YannickJadoul commented 4 years ago

Hopefully helpful note (you might already know this, though): it's possible to run Python in gdb, run your script, and get a stack trace when things come crashing down inside your extension module.

etijskens commented 4 years ago

Things become still weirder… I printed out the path as returned by Py_GetPath. Both the working minimal version and the failing OpenFOAM wrapper return the same path:

Py_GetPath=/home/bert/.pyenv/versions/3.8.2/lib/python38.zip:/home/bert/.pyenv/versions/3.8.2/lib/python3.8:/home/bert/.pyenv/versions/3.8.2/lib/python3.8/lib-dynload

But this is the Python determined by pyenv, not the one from my virtual environment. The virtual environment has numpy installed, the pyenv version not. The virtual environment uses the pyenv python, but via a soft link.

Maybe this rings a bell...

YannickJadoul commented 4 years ago

@etijskens And this does not match the sys.path from your interactive terminal? I'd start hunting for the moment at which this changes, in that case.

etijskens commented 4 years ago

I also printed sys.path: sys.path= ['/home/bert/workspace/pyFoam', '/home/bert/.pyenv/versions/3.8.2/lib/python38.zip', '/home/bert/.pyenv/versions/3.8.2/lib/python3.8', '/home/bert/.pyenv/versions/3.8.2/lib/python3.8/lib-dynload', '/home/bert/workspace/pyFoam/.venv/lib/python3.8/site-packages', '/home/bert/workspace/pyFoam'] This is more plausible, as it contains the site-packages of the virtual environment .venv.

However it remains puzzling that there is no difference between both applications, yet one crashes and the other doesn’t. Obviously, there must be a difference. It is just that I haven’t unearthed it yet Again, if I import numpy first, the not working application no longer crashes, which is also puzzling.

YannickJadoul commented 4 years ago

However it remains puzzling that there is no difference between both applications, yet one crashes and the other doesn’t.

Well, in one case, something is removing stuff from sys.path or at least changing it. So there has to be a difference somewhere. Do you have a non-minimal example we can use to reproduce, then?

etijskens commented 4 years ago

I uninstalled numpy from the virtual environment and ran the crashing application again. It fails, but with an exception, rather than a core dump:

Traceback (most recent call last): File "cases/pitzDaily-small-pysimplefoam-np2/solve.py", line 27, in p = solver.getLocalArray() ModuleNotFoundError: No module named 'numpy'

This indicates that something else goes wrong, rather then a path issue. If there would have been a path issue it would fail in the same way as above.

bstaletic commented 4 years ago

Interesting, but now I'm certain that this is environmental and not a pybind11 bug. I don't see how we can help without a way to repro...

etijskens commented 4 years ago

You are probably right, I just found out that passing the command line arguments to the OpenFOAM part is enough to make the application crash. Thanks for your support, without your help I wouldn’t have gotten that far…

YannickJadoul commented 4 years ago

I just found out that passing the command line arguments to the OpenFOAM part is enough to make the application crash.

Ugh, sounds annoying. Good luck figuring things out, further, then!

Can we close this issue? You can always reopen (or just add a comment) if new issues (or the same old issues) turn up again.

etijskens commented 4 years ago

That is fine - if I find a solution, I certainly post it. Many thanks

YannickJadoul commented 4 years ago

if I find a solution, I certainly post it

Yes, please do! Thanks :-)

etijskens commented 4 years ago

I solved the problem, finally. The trick is to construct OpenFOAM's argList object as args(argc,argv,true,true,/*initialise=*/false). It is great that OpenFOAM has this feature, although it is largely undocumented.

YannickJadoul commented 4 years ago

Huh, OK. But the good news then is that pybind11 is not to blame :-)

etijskens commented 4 years ago

That is good news indeed. I like it a lot. I already liked boost::python despite its dependency on boost and a specific python version. What I like even more is the quick support when a problem arises. Many thanks again.