wlav / cppyy

Other
413 stars 42 forks source link

🦗Data corruption dependent on whether a cpp object is passed directly into a function or stored in local python variable first #267

Closed lukevalenty closed 1 month ago

lukevalenty commented 1 month ago

I found a strange data corruption bug that seems to depend on whether a newly constructed cpp object is passed directly into a cpp function or if it is stored locally in a python variable first. I have both a passing and failing test case below. I don't think this is specific to std::tuple because it affected our own tuple implementation in stdx.

cppyy.cppdef("""
    namespace test {
        template<typename A, typename B>
        bool equals(A a, B b) {
            static_assert(std::is_same_v<A, B>, "Types must be the same");

            std::cout << "__PRETTY_FUNCTION__: " << __PRETTY_FUNCTION__ << std::endl;
            std::cout << "std::get<0>(a): " << std::get<0>(a) << std::endl;
            std::cout << "std::get<0>(b): " << std::get<0>(b) << std::endl;

            bool elem_equal = (std::get<0>(a) == std::get<0>(b));
            std::cout << "std::get<0>(a) == std::get<0>(b): " << elem_equal << std::endl;

            bool retval = (a == b);
            std::cout << "retval: " << retval << std::endl;

            return retval;
        }
    } // namespace test
""")

std = cppyy.gbl.std
test = cppyy.gbl.test

def test_std_get_tuple_tuple_fail():
    inner_tuple = std.tuple[type(42)](42)

    # NOTE: inlining the outer_tuple variable into the std.get[0](...) causes the test to fail
    get_inner_tuple = std.get[0](std.tuple[type(inner_tuple)](inner_tuple))

    # ERROR: the value inside `get_inner_tuple` is corrupted!
    print(f"python std.get[0](get_inner_tuple): {std.get[0](get_inner_tuple)}")
    print(f"python std.get[0](inner_tuple): {std.get[0](inner_tuple)}")

    assert test.equals(get_inner_tuple, inner_tuple)

def test_std_get_tuple_tuple_pass():
    inner_tuple = std.tuple[type(42)](42)

    # NOTE: creating an outer_tuple variable and passing that into std.get[0](...) causes the test to pass
    outer_tuple = std.tuple[type(inner_tuple)](inner_tuple)
    get_inner_tuple = std.get[0](outer_tuple)

    # both of the results of std.get[0](...) have the correct values
    print(f"python std.get[0](get_inner_tuple): {std.get[0](get_inner_tuple)}")
    print(f"python std.get[0](inner_tuple): {std.get[0](inner_tuple)}")

    assert test.equals(get_inner_tuple, inner_tuple)

I've left in my own debugging print statements that show the values contained within the inner tuples in both python and cpp. As you can see in the test output, one of the values gets corrupted:

========================================================= test session starts ==========================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 -- /home/lvalenty/cppyy_tvargs_dev/venv_tvargs_dev/bin/python
cachedir: .pytest_cache
hypothesis profile 'fast' -> max_examples=10, database=DirectoryBasedExampleDatabase(PosixPath('/home/lvalenty/cpp-std-extensions/test/pbt/.hypothesis/examples'))
rootdir: /home/lvalenty/cpp-std-extensions/test/pbt
plugins: forked-1.6.0, hypothesis-6.112.2, xdist-3.6.1
8 workers [2 items]     
scheduling tests via LoadScheduling

tuple.py::test_std_get_tuple_tuple_pass 
tuple.py::test_std_get_tuple_tuple_fail 
[gw1] [ 50%] PASSED tuple.py::test_std_get_tuple_tuple_pass 
[gw0] [100%] FAILED tuple.py::test_std_get_tuple_tuple_fail 

=============================================================== FAILURES ===============================================================
____________________________________________________ test_std_get_tuple_tuple_fail _____________________________________________________
[gw0] linux -- Python 3.10.12 /home/lvalenty/cppyy_tvargs_dev/venv_tvargs_dev/bin/python

    def test_std_get_tuple_tuple_fail():
        inner_tuple = std.tuple[type(42)](42)

        # NOTE: inlining the outer_tuple variable into the std.get[0](...) causes the test to fail
        get_inner_tuple = std.get[0](std.tuple[type(inner_tuple)](inner_tuple))

        # ERROR: the value inside `get_inner_tuple` is corrupted!
        print(f"python std.get[0](get_inner_tuple): {std.get[0](get_inner_tuple)}")
        print(f"python std.get[0](inner_tuple): {std.get[0](inner_tuple)}")

>       assert test.equals(get_inner_tuple, inner_tuple)
E       assert False
E        +  where False = <cppyy.TemplateProxy object at 0x7fbebcba5e90>(<cppyy.gbl.std.tuple<int> object at 0x555e671fd9a0>, <cppyy.gbl.std.tuple<int> object at 0x555e66f53d00>)
E        +    where <cppyy.TemplateProxy object at 0x7fbebcba5e90> = test.equals

tuple.py:136: AssertionError
--------------------------------------------------------- Captured stdout call ---------------------------------------------------------
python std.get[0](get_inner_tuple): 1836477548
python std.get[0](inner_tuple): 42
__PRETTY_FUNCTION__: bool test::equals(A, B) [A = std::tuple<int>, B = std::tuple<int>]
std::get<0>(a): 1836477548
std::get<0>(b): 42
std::get<0>(a) == std::get<0>(b): 0
retval: 0
======================================================= short test summary info ========================================================
FAILED tuple.py::test_std_get_tuple_tuple_fail - assert False
===================================================== 1 failed, 1 passed in 3.05s ======================================================
wlav commented 1 month ago

Yes, Python ref-counting rules work differently for deeply nested statements than C++ rules for temporaries. In the case of C++, temporaries are guaranteed to live for the duration of the statement, so you can write things like the above, but in Python this doesn't (always) work: the temporary reaches a ref-count of 0 before all C++ (!) references to its internals are gone.

Note that for methods, cppyy captures the common case where the address of the return value falls within the memory extend of the object it is called on, by putting a hardref from the result back onto the object. Here, however, get is a global function and so that case isn't caught. It would be simple to write a pythonization specifically for get to fix that, though, as a back-reference is always warranted for non-builtin types.

wlav commented 1 month ago

This is more detailed explanation of what's going on with the temporaries: https://pybind11.readthedocs.io/en/stable/advanced/functions.html#return-value-policies

lukevalenty commented 1 month ago

Bummer! What would the pythonization for std::get look like?

wlav commented 1 month ago

The following probably isn't very efficient (as is done in Python, with so many indirections), but then again, get<> isn't all that efficient to begin with. It would probably warrant its own specialized class at some point, with reusable calls based on memory offsets instead of wrapper functions (that would also solve the problems I have with it on Windows). In any case, this would be one way to handle the life times:

import cppyy
import cppyy.types

class SafeGet:
    def __init__(self, get):
        self._get = get

    def __call__(self, tup):
        res = self._get.__call__(tup)
        if isinstance(res, cppyy.types.Instance):
            res.__get_keepalive = tup
        return res

    def __getitem__(self, *args):
        res = self._get.__getitem__(*args)
        return SafeGet(res)

cppyy.gbl.std.get = SafeGet(cppyy.gbl.std.get)
lukevalenty commented 1 month ago

Thanks!