pypy / pypy

PyPy is a very fast and compliant implementation of the Python language.
https://pypy.org
Other
1.11k stars 59 forks source link

cpyext: reference cycles are not handled (tp_traverse, tp_clear are never called) #3848

Open gitlab-importer opened 2 years ago

gitlab-importer commented 2 years ago

In Heptapod by @wjakob on Nov 10, 2022, 19:34

Dear PyPy team (& @mattip),

I'm chasing down a few remaining nanobind test suite issues. Here is reproducer 1/3:

This test sets up a wrapper type that contains a nested PyObject * pointer. When the pointer refers to the parent object, it creates a circular reference that will cause the object to leak using only normal reference counting. To handle cycles, CPython requires that developers implement the slots tp_traverse (to identify the cycle) and tp_clear (to break it).

Python test: https://github.com/wjakob/pypy_issues/blob/master/issue_5.py#L18

C code: https://github.com/wjakob/pypy_issues/blob/master/pypy_issues.c#L10

In PyPy, the associated functions are never called, and the associated objects leak.

My reproducer is a contrived example, but such cycles happen all the time and can be created quite easily. Consider for example Python bindings for a GUI library: a button widget (implemented in C) might have a callback method that can be set from Python. The problem is that the assigned python function object is actually a closure that captures any referenced variables. If the button callback closes the current window (e.g. window.close()) it creates a cycle because the closure depends on the window object, and window will in turn reference the button contained in the window. The tp_traverse and tp_clear slots make it possible to handle such inter-language cycles properly.

Thanks, Wenzel

gitlab-importer commented 2 years ago

In Heptapod by @mattip on Nov 15, 2022, 12:09

This is a big task. We need to implement a cpyext-aware garbage collector that can notice isolated cycles of cpyext objects. There was work done on the cpyext-gc-cycle branch. I merged py3.8 into it, and will try to move forward with this over the next couple of weeks, but I do not think we can hold up the release for it.

gitlab-importer commented 2 years ago

In Heptapod by @cfbolz on Nov 15, 2022, 13:57

so to summarize a bit:

gitlab-importer commented 2 years ago

In Heptapod by @wjakob on Nov 15, 2022, 14:33

My reproducer demonstrates the problem using a cycle of only cpyext objects, but it was not my intention to only report that specific case. Inter-language cycles are common, for example where a function closure captures an object defined in an extension library (The cycle is cpyext instance -> function object -> cpyext instance). Another common case (though that could perhaps be handled specially) is where an object in an extension library defines tp_dictoffset which leads to circular references involving dictionaries and instances of the class. Is there any technical reason why the issue could be handled in HPy but not in cpyext?

gitlab-importer commented 2 years ago

In Heptapod by @arigo on Nov 15, 2022, 15:57

Is there any technical reason why the issue could be handled in HPy but not in cpyext?

Yes. HPy is designed for this kind of cross-implementation support, whereas CPython's API is not at all.

gitlab-importer commented 2 years ago

In Heptapod by @cfbolz on Nov 15, 2022, 21:05

I think it's basically a pretty difficult problem, and the APIs need to be in a certain shape to support it (and the CPython C-API isn't). As a data point, this is pretty similar to the problems that browsers have. the DOM is in C++ and typically uses refcounting, the Javascript heaps use precise GC. only recently did Chrome get reasonable cross-component GC: https://v8.dev/blog/high-performance-cpp-gc and that is certainly making use of the fact that Google has complete control over the APIs.

gitlab-importer commented 2 years ago

In Heptapod by @wjakob on Nov 15, 2022, 21:11

What I don't understand is the difference between the HPy and cpyext APIs. To me, it seems like HPy simply wraps the tp_traverse and tp_clear functionality--in fact, it has little alternative because extensions must at the end of the day work on CPython that uses this interface. The example code in the HPy documentation looks near-identical: https://docs.hpyproject.org/en/latest/porting-example/index.html?highlight=traverse#step-02-transition-some-methods-to-hpy

Could that same mechanism not be also used for cpyext? What makes them so different?