Open nhusung opened 3 weeks ago
It can be tricky to manage object lifetimes when one mixes C and Python like CFFI allows you to do.
The C function handle_pair_t children(handle_t) should just return the handles to the two “child nodes” as a pair
You must explicitly keep the return value alive, otherwise the GC will destroy it. Making it explicit may be clearer, and suggests a path to solving the problem. I think this would solve the problem, does it make sense?
def children(handle):
ret = ffi.new("handle_t[1]")
pair = children(handle)
# What keeps `pair` alive?
# Copy the data, so when pair is collected the data will still be valid
ret[0] = pair.first
return ret
Thank you for your quick answer! I was assuming that pair.first
would create a copy of the handle_t
, but it is just an unowned reference to the handle_t
, as you pointed out. Unfortunately, I cannot simply change the return type to handle_t[1]
or handle_t *
, because lib.children()
expects a handle_t
and (unless I missed something) I cannot pass a handle_t[1]
or a handle_t *
in place of a handle_t
. Is there a simple way to copy the struct behind a handle_t &
into a new handle_t
? ffi.new()
appears to support pointer and array types only. In principle, I could write a C function handle_t identity(handle_t h) { return h; }
, but doing so for every type seems like a lot of boilerplate and (also does not sound terribly efficient). Alternatively, I could always copy the result of a function returning a struct into a new allocation (e.g., handle = ffi.new("struct handle_t *", lib.new_handle(1))
), and then write lib.children(handle[0])
. Allowing the handle
parameter of the Python children
function to be either handle_t *
or handle_t
(with a runtime check in the Python function) does not really appear to be an option. Which solution is “intended” by CFFI here?
Then, I have another question regarding the fix you proposed, just to be sure: Until when is pair
guaranteed to live? Is it until the function returns (akin to C++ and Rust) or only until the last reference of pair
? I.e., is there a possibility that pair
is GC’d right after pair.first
is evaluated but before the handle_t
is copied into ret[0]
?
@mattip I think you're going in the wrong direction. The C code doesn't call malloc()
and the Python code doesn't call ffi.new()
in this example, so there is no keepalive issue that should be visible to the user.
@mattip Sorry, now I understood why your proposed fix is correct. Another way to view it is by changing the prints:
def children(handle):
print(f"The parameter I got: {handle}")
c = lib.children(handle).first
return c
You can see that the first time it is called, it is with a <cdata 'struct handle_t' owning 16 bytes>
. The second time, it is with a <cdata 'struct handle_t &' 0x...>
. The problem is, like I now realize you said, that the expression lib.children(handle).first
is buggy because it gets a pointer to the first
field inside that structure, which is a substructure, but then it frees the parent structure because the Python object returned by lib.children(handle)
disappears.
@nhusung Python semantics should guarantee that pair
won't be freed until the exit of the function, so the line ret[0] = pair.first
should not have a free-before-read issue. At least, this should work in both CPython and PyPy.
But that's arguably an unclean workaround for a bug of cffi. Maybe cffi itself should keep the parent structure alive when we get a substructure. There isn't much of a point about getting a substructure, which is really a pointer somewhere inside the parent structure, if we don't keep the parent alive.
Maybe cffi itself should keep the parent structure alive when we get a substructure
There must be previous art for this kind of problem. CTypes or C/C++?
Ctypes uses very extensive keepalive rules that cffi was never meant to copy. C doesn't have any keepalive. C++ doesn't natively have any but some frameworks give you some. Cffi does a few special cases but defaults to "no".
Maybe it's possible to come up with a rule that generalize the few existing rules. Something like: if you start with an "owning" object and do ANY basic operation that returns a pointer inside the SAME "owned" structure or array, maybe that returned pointer object should always inherit the keepalive.
I’m not sure whether this is a CFFI or a CPython bug, but recently this issue was reported: https://github.com/OxiDD/oxidd/issues/23. Today, I created a stripped-down version of the CFFI-based bindings at https://github.com/nhusung/cffi-cpython-bug. On the C side, we have some handle type
handle_t
, which is a pair of a pointer and an index. Conceptually, these handles refer to binary nodes, but in the example I just use arbitrary values. The C functionhandle_pair_t children(handle_t)
should just return the handles to the two “child nodes” as a pair, but the bindings appear to modify the Python object corresponding to the handle passed as argument tochildren()
. (Note that the C function takes the handle by value, so it cannot be the culprit.) Here is the Python test code:Output (the line between “before” and “after” is from the C code):
I could reproduce this with CFFI 1.17.1 on both CPython 3.10.12 (Ubuntu 22.04) and CPython 3.12.7 (Fedora 40). PyPy does not appear to be affected (tested with PyPy 7.3.15 / Python 3.10.13).
With GCC’s address sanitizer, I get the following error message directly after
before: (<cdata 'void *' 0x7efec7aa0760>, 21)
: