wlav / CPyCppyy

Other
23 stars 20 forks source link

Add __dict__ to LowLevelView #7

Open tomas16 opened 1 year ago

tomas16 commented 1 year ago

EDIT: originally this was a feature request to add weak reference support to the LowLevelView type. After discussion it seems like adding support for a __dict__ is a better solution.

I have a specific use case for this I will explain below. However I'm open to suggestions for alternative solutions. I looked into using special variables and couldn't come up with a solution, though I must admit I'm not very confident about my understanding of what these variables do.

Suppose I have C++ functions that return std::vector. The goal is to turn these into numpy arrays without copying any data. The challenge is I want to be able to keep the underlying C++ objects alive until no python object is referencing the LowLevelView anymore.

To illustrate the issue, consider this code:

import cppyy
import numpy as np

cppyy.cppdef("""
#include <numeric>

static std::vector<int> VectorFunc(size_t len)
{
    std::vector<int> ret(len);
    std::iota(ret.begin(), ret.end(), 0);
    return ret;
}
""")
VectorFunc = cppyy.gbl.VectorFunc

def vector_func(size):
    vec = VectorFunc(size)
    arr = np.frombuffer(vec.data(), dtype=np.int32)
    # at this point, arr is completely valid because we still have a reference to vec
    # arr.base is a LowLevelView object.
    print(arr)
    return arr

arr = vector_func(5)
print(arr)

Output:

[0 1 2 3 4]
[-1689064896       41504         499           0           4]

We know we need to keep the C++ vector alive for as long as anything is referencing the LowLevelView object pointing to vec.data(). One way to do this is to wrap the LowLevelView object with a small wrapper class that also stores a reference to vec and make sure arr.base points to an instance of this class. The issue is this wrapper class needs to "forward" the buffer protocol, i.e. it needs to present the buffer protocol from its LowLevelView member to its callers. I don't think this is currently possible. PEP688 may change this starting in 3.12.

Another way to achieve the same behavior would be to keep a reference to vec until the last reference to the LowLevelView goes out of scope. To that end, we could keep a dictionary where the keys are weak references to LowLevelViews and the values are the corresponding std::vectors. Continuing from the previous code:

import weakref

cpp_objects = {}

def vector_func2(size):
    def callback(x):
        del cpp_objects[x]

    vec = VectorFunc(size)
    llv = vec.data()
    wr = weakref.ref(llv, callback)
    cpp_objects[wr] = vec
    return np.frombuffer(llv, dtype=np.int32)

vector_func2(5)

I think this would do the trick, but it requires LowLevelView to support weak referencing.

  File "/Users/tomas/repo/oct100/python/cpycppyy_issue1.py", line 46, in <module>
    vector_func2(5)
  File "/Users/tomas/repo/oct100/python/cpycppyy_issue1.py", line 42, in vector_func2
    wr = weakref.ref(llv, callback)
TypeError: cannot create weak reference to 'cppyy.LowLevelView' object
tomas16 commented 1 year ago

weakref.patch

I tried compiling my own libcppyy.so and using it as a drop-in replacement, but couldn't get that to work. Depending on the commit I used I either got a segfault or a dlopen ImportError (symbol not found in flat namespace '__ZN5Cppyy9ConstructEmPv').

As such, I wasn't able to test this patch. So treat it as a starting point rather than a complete patch.

wlav commented 1 year ago

Cppyy::Construct(unsigned long, void*) is new in cppyy-backend as of 3.0.0 (cppyy-backend-1.14.11). Either update the latter, or branch from tag CPyCppyy-1.12.12 of the CpyCppyy sources.

Note that a big change in 3.0.0 is the move to Clang13 and the use of C++20.

Separately, std::vector<>::data is already pythonized (in order to add the size to the returned LowLevelView), which is a good place to put a lifeline on the vector. However, my preferences in such cases is always to put a hard ref on the owner instance (the vector in this case).

tomas16 commented 1 year ago

Separately, std::vector<>::data is already pythonized (in order to add the size to the returned LowLevelView), which is a good place to put a lifeline on the vector. However, my preferences in such cases is always to put a hard ref on the owner instance (the vector in this case).

So how would I put a lifeline on the vector in practice?

Also, if you have a hard ref from the vector to the LowLevelView object, then what I'm proposing won't work because the LowLevelView object's refcount never reaches 0 and so it'll never trigger the weakref callback, meaning all my vectors would leak because they'd be stored in that dictionary forever.

wlav commented 1 year ago

So how would I put a lifeline on the vector in practice?

currently can't, because the view has no place to set it. It would need a new data member.

if you have a hard ref from the vector to the LowLevelView object

The other way around, in pseudo-code: llv.__lifeline = vec. I.e., the vector stays alive at least as long as the view does (and doesn't need to be stored in cpp_objects).

tomas16 commented 1 year ago

I guess I tried that at some point and explored the weakref approach because I couldn't set any attributes on LowLevelView. But if we're making changes to the C++ implementation of LowLevelView, then I agree with you it's probably best to just make it support a __dict__ so clients can add any userdata they need.

tomas16 commented 1 year ago

dict.patch

If I understand the documentation right, attached patch should be all that's needed to add a __dict__ to LowLevelView.

I tried compiling the various cppyy packages from source in order to test this, but I'm having trouble compiling cling (see below for details).

@wlav would you consider adding a change like this?

Error compiling cling:

  [ 98%] Generating G__CoreLegacy.cxx, ../lib/libCoreLegacy.rootmap
  In file included from input_line_8:58:
  /Users/tomas/repo/cppyy-backend/cling/builddir/include/TClassEdit.h:29:10: fatal error: 'cxxabi.h' file not found
  #include <cxxabi.h>
           ^~~~~~~~~~
  Error: /Users/tomas/repo/cppyy-backend/cling/builddir/core/rootcling_stage1/src/rootcling_stage1: compilation failure (/Users/tomas/repo/cppyy-backend/cling/builddir/lib/libCoreLegacyb706865131_dictUmbrella.h)
  make[2]: *** [core/G__CoreLegacy.cxx] Error 1
  make[1]: *** [core/CMakeFiles/G__CoreLegacy.dir/all] Error 2