pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.67k stars 2.1k forks source link

Class returning a list of owned pointers, with Python then owning the list #637

Closed tzh1043 closed 7 years ago

tzh1043 commented 7 years ago

I have a Data class which holds pointers to Element owned by only Data. I want to return these elemens in a container (list or tuple) [1].

When returning the internal vector<Element*> container of Data by reference everything works, but returning a container copy or a different container (tuple) does not work.

Returning (and leaking according to valgrind [2]) a pointer or reference to a heap-allocated vector works. How can prevent this leak? Also, the RuntimeError: return_value_policy = move is misleading because reference_internal is used everywhere.

Every python shell is started with python3 -i -c 'from datatest import *; d = get_data()'

>>> d.refs()[0].i()
115

vs

>>> d.copy()[0].i()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: return_value_policy = move, but the object is neither movable nor copyable!

but interestingly, first calling d.ptr() and then d.copy() works does not raise when the interactive shell calls __repr__ but is still broken when doing anything else including calling that method yourself.

>>> d.refs()
[<datatest.Element object at 0x7f3d2d8fd1b0>]
>>> d.copy()
[<datatest.Element object at 0x7f3d2d8fd1b0>]

1:

#include "pybind11/pybind11.h"
#include "pybind11/stl.h"
namespace py = pybind11;

struct Data;
Data* get_data();

struct Element {
    int i;

private:
    Element(int i_) : i(i_) {}

    Element(const Element&) = delete;
    Element(Element&&) = delete;

    friend Data* get_data();
};

struct Data {
    std::vector<Element*> elements;

    std::vector<Element*>& refs() { return elements; }
    std::vector<Element*>* ptr() { return &elements; }

    std::vector<Element*> copy() { return elements; }
    std::tuple<Element*> tuple() { return std::make_tuple(elements.at(0)); }

};
Data* get_data() {
    auto *data = new Data(); data->elements.push_back(new Element{115}); return data;
}

PYBIND11_PLUGIN(datatest) {
    py::module module("datatest");

    py::class_<Element>(module, "Element")
            .def("i", [](Element& e) { return e.i; });

    py::class_<Data>(module, "Data")
            .def("refs", &Data::refs, py::return_value_policy::reference_internal)
            .def("ptr", &Data::ptr, py::return_value_policy::reference_internal)

// RuntimeError: return_value_policy = move, but the object is neither movable nor copyable!
            .def("copy", &Data::copy, py::return_value_policy::reference_internal)
            .def("tuple", &Data::tuple, py::return_value_policy::reference_internal)
            .def("vec", [](Data &d) -> std::vector<Element*> {
                                     std::vector<Element*> ret{ d.elements.at(0) };
                                     return ret; },
                 py::return_value_policy::reference_internal)

            .def("leak", [](Data &d) -> std::vector<Element*>& {
                                   auto *ret = new std::vector<Element*>{ d.elements.at(0) };
                                   return *ret; },
                 py::return_value_policy::reference_internal)

            .def("leak2", [](Data &d) -> std::vector<Element*>* {
                                   auto *ret = new std::vector<Element*>{ d.elements.at(0) };
                                   return ret; },
                 py::return_value_policy::reference_internal)

    ; // end class

    module.def("get_data", &get_data, py::return_value_policy::take_ownership);
    return module.ptr();
}

2:

$ valgrind python3 -c 'from datatest import *; d = get_data(); d.ptr()[0].i();' 
     definitely lost: 0 bytes in 0 blocks
$valgrind python3 -c 'from datatest import *; d = get_data(); d.leak()[0].i();'
    definitely lost: 24 bytes in 1 blocks

Using Python 3.5.3 and g++ 6.3.

jagerman commented 7 years ago

It's not surprising that leak() and `leak2() leak: you're telling pybind11 that you are managing the reference, but you aren't.

vec and copy look the same to me (more on copy below).

ptr and refs are essentially the same: pybind11 isn't managing the reference itself, but it also is using that same policy for the elements of the vector.

The issue with copy (and tuple) is trickier: you're returning an rvalue; pybind sees this and forces return_value_policy::move on the vector, which is fine (you returned an rvalue to pybind11; it has to either move or copy it to store it, because all it has is a temporary).

The problem comes about because container casters (such as stl.h's list caster used for vector) converts its arguments with the return_value_policy used on the vector itself—but we just forced that to move, so it tries moving the pointed-at values inside the vector into new instances which pybind11 will wrap and handle. This, of course, is where we run into trouble: move was valid on the vector itself, but isn't valid on the vector's elements. (For tuple, unordered_map, etc. the story will be the same).

Cc @dean0x7d on this; I'm fairly sure that #510 caused this (but, of course, it also fixed another problem). We're basically (now) forcing a move/copy policy on containers which carries through to the container's elements, which is the problem here.

As a workaround in this case, the following should work:

using ListCasterBase = pybind11::detail::list_caster<std::vector<Element *>, Element *>;
namespace pybind11 { namespace detail {
template<> struct type_caster<std::vector<Element *>> : ListCasterBase {
    static handle cast(const std::vector<Element *> &src, return_value_policy, handle parent) {
        return ListCasterBase::cast(src, return_value_policy::reference, parent);
    }
    static handle cast(const std::vector<Element *> *src, return_value_policy pol, handle parent) {
        return cast(*src, pol, parent);
    }
};
}}

which is basically just wrapping stl.h's list caster with an override to force the reference policy. (You'll need to be careful with this though: there's no dependency established between the pointers and the original object, so be sure to keep it around).

tzh1043 commented 7 years ago

Your workaround works, thank you!

It's not surprising that leak() and leak2() leak: you're telling pybind11 that you are managing the reference, but you aren't.

Oh, absolutely, I just wanted to demonstrate that the best thing I could come up with (and which does not crash via a double-free) would leak.

There's no dependency established between the pointers and the original object, so be sure to keep it around.

Reading about the weak_ptr-using keep_alive feature, it seems the raw pointers of Element can not be tied to anything. Is the vector completely replaced by the python list object or could it somehow be tied to the lifetime of Data? This would not make much sense for a list (what after list.clear()) but for an immutable tuple it would at least be something.

dean0x7d commented 7 years ago

Cc @dean0x7d on this; I'm fairly sure that #510 caused this (but, of course, it also fixed another problem). We're basically (now) forcing a move/copy policy on containers which carries through to the container's elements, which is the problem here.

Carrying over the policy to the container's elements is usually sound. If std::vector<Element> is the return value, reference_internal is equally bad for both the container and the elements, which are all rvalues: the references would point to temp objects resulting in undefined behavior. So moving (copying) out all the elements is the only thing that can reasonably be done with them.

Unfortunately, this breaks down with std::vector<Element*> because the policies have a slightly different meaning for raw pointers than for values ("move your pointee" instead of "move yourself"). And this really is special behavior present only for raw pointers. Note that placing smart pointers in the vector would restore expected behavior (smart pointers know what "move" means).

A solution would be to relocate the policy selection logic. Right now, the move policy override happens when a function is wrapped (and propagates from the container to its elements). This can also be moved so that each individual type_caster makes the selection for itself. I think this is the most appropriate solution since it would both avoid undefined behavior for rvalues, and ownership issues for raw pointers. However, that does require that each type_caster has both a cast(const T &) and cast(T &&) overloads where each does the right thing. This could result in some duplication and boilerplate between casters, which is why I wanted to avoid it in the first place. But perhaps it wouldn't be too bad. I can code it up, make a proposal and we can take it from there. Thoughts?

dean0x7d commented 7 years ago

The type_caster suggested by @jagerman looks like the right answer here. Trying to pursue an automatic library solution looks like it would be very messy and only benefit lists of owned raw pointers (which I really hope is just a corner case in a C++11/14/17 world). So I'll close this and refer to https://github.com/pybind/pybind11/issues/637#issuecomment-277422232 for the type_caster solution for raw-pointer-inside-container ownership.

ezyang commented 6 years ago

I ran into this problem recently. It would be nice if the error message that occurs when this occurs gave better guidance about how to fix it and/or adding extra type checks in the template definition for def to detect common cases when this situation occurs statically.

ezyang commented 6 years ago

Also, another goofy workaround is to return std::vector<std::unique_ptr<Element*, py::nodelete>> instead of std::vector<Element*> in the function you're wrapping.

TomasPuverle commented 3 years ago

Sorry to resurrect - is this still the recommended approach for returning std:: containers of raw pointers that have internal_reference semantics?

YannickJadoul commented 3 years ago

@TomasPuverle, as far as I know, there's not been a big breakthrough on these complex issues, no.

TomasPuverle commented 3 years ago

Thank you for the follow-up.

beantowel commented 5 months ago

Also, another goofy workaround is to return std::vector<std::unique_ptr<Element*, py::nodelete>> instead of std::vector<Element*> in the function you're wrapping.

This is not working, python is still calling destructor of the Elements instead of nodelete::operator() ...