pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.65k stars 2.1k forks source link

[BUG]: Unable to convert function return value to a Python type #3751

Open tdegeus opened 2 years ago

tdegeus commented 2 years ago

Required prerequisites

Problem description

Consider a module mymodule. From it, I am returning a reference to a member as follows

const GooseFEM::Vector& vector() const { return m_vector; }

with corresponding Python API

PYBIND11_MODULE(mymodule, m)
{
    py::class_<Myclass> cls(m, "Myclass");
    ...
    cls.def("vector", &Myclass::vector, "vector");
}

Now GooseFEM has its own (pybind11) Python API, so this works brilliantly.

Accept... Since a week or so, the package of GooseFEM (and other libraries, just making an example here) shipped from conda-forge is no longer playing nicely with my locally compiled library on Linux (macOS and Windows work fine).

I have no idea what is going on, but have been able to get a minimal reproducer: https://github.com/tdegeus/test_pybind (which include the failing CI on Linux, and passing CI on macOS and Windows).

Reproducible example code

See full working example, including CI: https://github.com/tdegeus/test_pybind

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <GooseFEM/GooseFEM.h>

namespace py = pybind11;

class Myclass
{
public:
    Myclass() = default;

    Myclass(size_t n) {
        m_mesh = GooseFEM::Mesh::Quad4::Regular(n, n);
        m_vector = GooseFEM::Vector(m_mesh.conn(), m_mesh.dofs());
    }

    const GooseFEM::Vector& vector() const
    {
        return m_vector;
    }

private:
    GooseFEM::Mesh::Quad4::Regular m_mesh;
    GooseFEM::Vector m_vector;
};

PYBIND11_MODULE(mymodule, m)
{
    m.doc() = "Foo";
    py::class_<Myclass> cls(m, "Myclass");
    cls.def(py::init<size_t>(), "Myclass", py::arg("n"));
    cls.def("vector", &Myclass::vector, "vector");
}

and Python code

import GooseFEM
import mymodule

a = mymodule.Myclass(3)
v = a.vector()
print(v)
tdegeus commented 2 years ago

The issue seems to be that conda-forge is building with gcc-10, and that I have to do that locally too : https://github.com/tdegeus/test_pybind/pull/4

virtuald commented 2 years ago

I've ran into issues wrt compiler compatibilty also (specifically, building locally with gcc 11 against a different package that was built on ubuntu 18/gcc8). The issue I ran into was similar to yours, in that the symbol names changed because of ABI differences between the compilers, so class registration/etc would fail.

I feel like this used to work more reliably, but maybe the compiler ABIs didn't change as much in the past?

tdegeus commented 2 years ago

It would be nice if there would be a way to become more robust against that. Or at the very least have a way to check if this is the problem

henryiii commented 2 years ago

My guess is __GXX_ABI_VERSION__ changed and the ABI protection did shield you from it, causing the two versions to be incompatible. You can probably disable our ABI protection (like PyTorch does) and be just fine, it’s unlikely the at something we depend on changed. But it might, in which case you’ll get UB. So you’d be on your own for that.

henryiii commented 2 years ago

In the past we only relied on our ABI number, and it mostly worked, except for rare segfaults from ppl mixing compilers. So we now probably go a bit overboard in requiring matching compiler names, matching __GXX_ABI_VERSION__, matching stdlib, and a few other things.

tdegeus commented 2 years ago

Thanks for the tip @henryiii . Just to know: how to I switch off the ABI protection?

henryiii commented 2 years ago

https://github.com/pybind/pybind11/pull/2602

tdegeus commented 2 years ago

Thanks. Two questions: