python-cffi / cffi

A Foreign Function Interface package for calling C libraries from Python.
https://cffi.readthedocs.io/en/latest/
Other
131 stars 42 forks source link

Keeping Parent Structures Alive While Substructures Are #141

Open nhusung opened 3 weeks ago

nhusung commented 3 weeks ago

I’m not sure whether this is a CFFI or a CPython bug, but recently this issue was reported: https://github.com/OxiDD/oxidd/issues/23. Today, I created a stripped-down version of the CFFI-based bindings at https://github.com/nhusung/cffi-cpython-bug. On the C side, we have some handle type handle_t, which is a pair of a pointer and an index. Conceptually, these handles refer to binary nodes, but in the example I just use arbitrary values. The C function handle_pair_t children(handle_t) should just return the handles to the two “child nodes” as a pair, but the bindings appear to modify the Python object corresponding to the handle passed as argument to children(). (Note that the C function takes the handle by value, so it cannot be the culprit.) Here is the Python test code:

from _foo import lib

def children(handle):
    print(f"before: ({handle._p}, {handle._i})")
    c = lib.children(handle).first
    print(f"after: ({handle._p}, {handle._i})")
    return c

handle = lib.new_handle(1)
c = children(handle)
children(c)  # modifies c although it shouldn't

Output (the line between “before” and “after” is from the C code):

before: (<cdata 'void *' NULL>, 1)
children({._p = (nil), ._i = 1}) = {.first = {._p = 0x7efec7aa0760, ._i = 21}, .second = {._p = 0x7efec7aa0760, ._i = 101}}
after: (<cdata 'void *' NULL>, 1)
before: (<cdata 'void *' 0x7efec7aa0760>, 21)
children({._p = 0x7efec7aa0760, ._i = 21}) = {.first = {._p = (nil), ._i = 41}, .second = {._p = (nil), ._i = 121}}
after: (<cdata 'void *' NULL>, 41)

I could reproduce this with CFFI 1.17.1 on both CPython 3.10.12 (Ubuntu 22.04) and CPython 3.12.7 (Fedora 40). PyPy does not appear to be affected (tested with PyPy 7.3.15 / Python 3.10.13).

With GCC’s address sanitizer, I get the following error message directly after before: (<cdata 'void *' 0x7efec7aa0760>, 21):

=================================================================
==50160==ERROR: AddressSanitizer: heap-use-after-free on address 0x5070000011d0 at pc 0x7f0761ef5676 bp 0x7ffc454112c0 sp 0x7ffc45410a80
READ of size 16 at 0x5070000011d0 thread T0
    #0 0x7f0761ef5675 in memcpy (/usr/lib64/libasan.so.8+0xf5675) (BuildId: a4ad7eb954b390cf00f07fa10952988a41d9fc7a)
    #1 0x7f0761502bdc in convert_from_object src/c/_cffi_backend.c:1792
    #2 0x7f0761deba73 in _cffi_f_children /tmp/tmpmx2jcyr6.build-temp/_foo.c:602
    #3 0x7f076197385d in _PyEval_EvalFrameDefault (/lib64/libpython3.12.so.1.0+0x17385d) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #4 0x7f07619fe3b3 in PyEval_EvalCode (/lib64/libpython3.12.so.1.0+0x1fe3b3) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #5 0x7f0761a1c9b6 in builtin_exec (/lib64/libpython3.12.so.1.0+0x21c9b6) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #6 0x7f076198a2bb in cfunction_vectorcall_FASTCALL_KEYWORDS (/lib64/libpython3.12.so.1.0+0x18a2bb) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #7 0x7f0761978c2e in _PyEval_EvalFrameDefault (/lib64/libpython3.12.so.1.0+0x178c2e) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #8 0x7f07619920ba in object_vacall (/lib64/libpython3.12.so.1.0+0x1920ba) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #9 0x7f07619b1919 in PyObject_CallMethodObjArgs (/lib64/libpython3.12.so.1.0+0x1b1919) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #10 0x7f07619b11e7 in PyImport_ImportModuleLevelObject (/lib64/libpython3.12.so.1.0+0x1b11e7) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #11 0x7f0761978dc5 in _PyEval_EvalFrameDefault (/lib64/libpython3.12.so.1.0+0x178dc5) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #12 0x7f07619fe3b3 in PyEval_EvalCode (/lib64/libpython3.12.so.1.0+0x1fe3b3) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #13 0x7f0761a23ad9 in run_eval_code_obj (/lib64/libpython3.12.so.1.0+0x223ad9) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #14 0x7f0761a1e10d in run_mod (/lib64/libpython3.12.so.1.0+0x21e10d) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #15 0x7f0761a10bf5 in PyRun_StringFlags (/lib64/libpython3.12.so.1.0+0x210bf5) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #16 0x7f0761a108e3 in PyRun_SimpleStringFlags (/lib64/libpython3.12.so.1.0+0x2108e3) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #17 0x7f0761a300c2 in Py_RunMain (/lib64/libpython3.12.so.1.0+0x2300c2) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #18 0x7f07619e7e6b in Py_BytesMain (/lib64/libpython3.12.so.1.0+0x1e7e6b) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #19 0x7f0761639087 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #20 0x7f076163914a in __libc_start_main_impl ../csu/libc-start.c:360
    #21 0x56292c0dd094 in _start (/usr/bin/python3.12+0x1094) (BuildId: d1c1e9a6a04c8224290e342a507a918dcff95537)

0x5070000011d0 is located 48 bytes inside of 80-byte region [0x5070000011a0,0x5070000011f0)
freed by thread T0 here:
    #0 0x7f0761ef6638 in free.part.0 (/usr/lib64/libasan.so.8+0xf6638) (BuildId: a4ad7eb954b390cf00f07fa10952988a41d9fc7a)
    #1 0x7f076197e59f in _PyEval_EvalFrameDefault (/lib64/libpython3.12.so.1.0+0x17e59f) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #2 0x7f07619fe3b3 in PyEval_EvalCode (/lib64/libpython3.12.so.1.0+0x1fe3b3) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #3 0x7f0761a1c9b6 in builtin_exec (/lib64/libpython3.12.so.1.0+0x21c9b6) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #4 0x7f076198a2bb in cfunction_vectorcall_FASTCALL_KEYWORDS (/lib64/libpython3.12.so.1.0+0x18a2bb) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #5 0x7f0761978c2e in _PyEval_EvalFrameDefault (/lib64/libpython3.12.so.1.0+0x178c2e) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #6 0x7f07619920ba in object_vacall (/lib64/libpython3.12.so.1.0+0x1920ba) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #7 0x7f07619b1919 in PyObject_CallMethodObjArgs (/lib64/libpython3.12.so.1.0+0x1b1919) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #8 0x7f07619b11e7 in PyImport_ImportModuleLevelObject (/lib64/libpython3.12.so.1.0+0x1b11e7) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #9 0x7f0761978dc5 in _PyEval_EvalFrameDefault (/lib64/libpython3.12.so.1.0+0x178dc5) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #10 0x7f07619fe3b3 in PyEval_EvalCode (/lib64/libpython3.12.so.1.0+0x1fe3b3) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #11 0x7f0761a23ad9 in run_eval_code_obj (/lib64/libpython3.12.so.1.0+0x223ad9) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #12 0x7f0761a1e10d in run_mod (/lib64/libpython3.12.so.1.0+0x21e10d) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #13 0x7f0761a10bf5 in PyRun_StringFlags (/lib64/libpython3.12.so.1.0+0x210bf5) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #14 0x7f0761a108e3 in PyRun_SimpleStringFlags (/lib64/libpython3.12.so.1.0+0x2108e3) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #15 0x7f0761a300c2 in Py_RunMain (/lib64/libpython3.12.so.1.0+0x2300c2) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #16 0x7f07619e7e6b in Py_BytesMain (/lib64/libpython3.12.so.1.0+0x1e7e6b) (BuildId: 180b9c2b9ddff4d7006aec0fb9c92af57c2025f4)
    #17 0x7f0761639087 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #18 0x7f076163914a in __libc_start_main_impl ../csu/libc-start.c:360
    #19 0x56292c0dd094 in _start (/usr/bin/python3.12+0x1094) (BuildId: d1c1e9a6a04c8224290e342a507a918dcff95537)

previously allocated by thread T0 here:
    #0 0x7f0761ef7997 in malloc (/usr/lib64/libasan.so.8+0xf7997) (BuildId: a4ad7eb954b390cf00f07fa10952988a41d9fc7a)
    #1 0x7f07614f8d4b in allocate_owning_object src/c/_cffi_backend.c:3760
    #2 0x7f07614f8d4b in convert_struct_to_owning_object src/c/_cffi_backend.c:3790

SUMMARY: AddressSanitizer: heap-use-after-free (/usr/lib64/libasan.so.8+0xf5675) (BuildId: a4ad7eb954b390cf00f07fa10952988a41d9fc7a) in memcpy
Shadow bytes around the buggy address:
  0x507000000f00: fd fd fd fd fd fd fd fd fd fd fa fa fa fa 00 00
  0x507000000f80: 00 00 00 00 00 00 06 fa fa fa fa fa 00 00 00 00
  0x507000001000: 00 00 00 00 06 fa fa fa fa fa 00 00 00 00 00 00
  0x507000001080: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x507000001100: 00 fa fa fa fa fa 00 00 00 00 00 00 00 00 00 03
=>0x507000001180: fa fa fa fa fd fd fd fd fd fd[fd]fd fd fd fa fa
  0x507000001200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x507000001280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x507000001300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x507000001380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x507000001400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==50160==ABORTING
mattip commented 3 weeks ago

It can be tricky to manage object lifetimes when one mixes C and Python like CFFI allows you to do.

The C function handle_pair_t children(handle_t) should just return the handles to the two “child nodes” as a pair

You must explicitly keep the return value alive, otherwise the GC will destroy it. Making it explicit may be clearer, and suggests a path to solving the problem. I think this would solve the problem, does it make sense?

def children(handle):
    ret = ffi.new("handle_t[1]")
    pair = children(handle)
    # What keeps `pair` alive?
    # Copy the data, so when pair is collected the data will still be valid
    ret[0] = pair.first
    return ret
nhusung commented 3 weeks ago

Thank you for your quick answer! I was assuming that pair.first would create a copy of the handle_t, but it is just an unowned reference to the handle_t, as you pointed out. Unfortunately, I cannot simply change the return type to handle_t[1] or handle_t *, because lib.children() expects a handle_t and (unless I missed something) I cannot pass a handle_t[1] or a handle_t * in place of a handle_t. Is there a simple way to copy the struct behind a handle_t & into a new handle_t? ffi.new() appears to support pointer and array types only. In principle, I could write a C function handle_t identity(handle_t h) { return h; }, but doing so for every type seems like a lot of boilerplate and (also does not sound terribly efficient). Alternatively, I could always copy the result of a function returning a struct into a new allocation (e.g., handle = ffi.new("struct handle_t *", lib.new_handle(1))), and then write lib.children(handle[0]). Allowing the handle parameter of the Python children function to be either handle_t * or handle_t (with a runtime check in the Python function) does not really appear to be an option. Which solution is “intended” by CFFI here?

Then, I have another question regarding the fix you proposed, just to be sure: Until when is pair guaranteed to live? Is it until the function returns (akin to C++ and Rust) or only until the last reference of pair? I.e., is there a possibility that pair is GC’d right after pair.first is evaluated but before the handle_t is copied into ret[0]?

arigo commented 3 weeks ago

@mattip I think you're going in the wrong direction. The C code doesn't call malloc() and the Python code doesn't call ffi.new() in this example, so there is no keepalive issue that should be visible to the user.

arigo commented 3 weeks ago

@mattip Sorry, now I understood why your proposed fix is correct. Another way to view it is by changing the prints:

def children(handle):
    print(f"The parameter I got: {handle}")
    c = lib.children(handle).first
    return c

You can see that the first time it is called, it is with a <cdata 'struct handle_t' owning 16 bytes>. The second time, it is with a <cdata 'struct handle_t &' 0x...>. The problem is, like I now realize you said, that the expression lib.children(handle).first is buggy because it gets a pointer to the first field inside that structure, which is a substructure, but then it frees the parent structure because the Python object returned by lib.children(handle) disappears.

arigo commented 3 weeks ago

@nhusung Python semantics should guarantee that pair won't be freed until the exit of the function, so the line ret[0] = pair.first should not have a free-before-read issue. At least, this should work in both CPython and PyPy.

But that's arguably an unclean workaround for a bug of cffi. Maybe cffi itself should keep the parent structure alive when we get a substructure. There isn't much of a point about getting a substructure, which is really a pointer somewhere inside the parent structure, if we don't keep the parent alive.

mattip commented 3 weeks ago

Maybe cffi itself should keep the parent structure alive when we get a substructure

There must be previous art for this kind of problem. CTypes or C/C++?

arigo commented 3 weeks ago

Ctypes uses very extensive keepalive rules that cffi was never meant to copy. C doesn't have any keepalive. C++ doesn't natively have any but some frameworks give you some. Cffi does a few special cases but defaults to "no".

Maybe it's possible to come up with a rule that generalize the few existing rules. Something like: if you start with an "owning" object and do ANY basic operation that returns a pointer inside the SAME "owned" structure or array, maybe that returned pointer object should always inherit the keepalive.