rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 3 forks source link

Build Python packages using the limited API #42

Open vyasr opened 5 months ago

vyasr commented 5 months ago

Python has a limited API that is guaranteed to be stable across minor releases. Any code using the Python C API that limits itself to using code in the limited API is guaranteed to also compile on future minor versions of Python within the same major family. More importantly, all symbols in the current (and some historical) version of the limited API are part of Python's stable ABI, which also does not change between Python minor versions and allows extensions compiled against one Python version to continue working on future versions of Python.

Currently RAPIDS builds a single wheel per Python version. If we were to compile using the Python stable ABI, we would be able to instead build a single wheel that works for all Python versions that we support. There would be a number of benefits here:

Here are the tasks (some ours, some external) that need to be accomplished to make this possible:

At this stage, it is not yet clear whether the tradeoffs required will be worthwhile, or at what point the ecosystem's support for the limited API will be reliable enough for us to use in production. However, it shouldn't be too much work to get us to the point of at least being able to experiment with limited API builds, so we can start answering questions around performance and complexity fairly soon. I expect that we can pretty easily remove explicit reliance on any APIs that are not part of the stable ABI, at which point this really becomes a question of the level of support our binding tools provide and if/when we're comfortable with those.

jakirkham commented 5 months ago

It is worth noting that the Python Buffer Protocol C API landed in Python 3.11 (additional ref). So think that is a minimum for us

Also find this listing of functions in the Limited and Stable API quite helpful

vyasr commented 5 months ago

Yes. I have been able to build most of RAPIDS using Cython's limited API supported (along with some additional changes I have locally) in Python 3.11. Python 3.11 is definitely a must. But as I said above in the "intermediate vs long-term" bullet, we could still benefit before dropping Python<3.11 support by building one wheel for each older Python version and then build an abi3 wheel to be used for Python 3.11+.

vyasr commented 5 months ago

I've made PRs ro rmm, raft, and cuml that address the issues in those repos. I've also taken steps to remove ucxx's usage of the the numpy C API (#41), which in turn removes one of its primary incompatibilities. The last major issue in RAPIDS code that I see is the usage of the array module in the Array class that is vendored by both kvikio and ucxx (and ucx-py). If that can be removed, then I think we'll be in good shape on the RAPIDS end, and we'll just be waiting on support for this feature in Cython itself. @jakirkham expressed interest in helping out with that in the process of making that Array class more broadly usable.

da-woods commented 5 months ago

A small warning here:

There's definitely places where Cython is substituting private C API for private Python API, so future compatibility definitely isn't guaranteed (it'll just be a runtime failure rather than a compile-time failure). We'll see how that evolves - I hope to be able to make some of these warnings rather than failures (since it's largely just non-essential introspection support).

We're also having to build a few more runtime version-checks into our code. Which is obviously a little risky because although you're compiling the same thing, you're taking different paths on different Python versions.

So the upshot is that your testing matrix probably doesn't reduce to a single version. (From Cython's point of view the testing matrix probably expands, because we really should be testing combinations like Py_LIMITED_API=0x03090000 with Python 3.12 and that gets big quite quickly so I don't know how we're going to do that)

vyasr commented 4 months ago

Thanks for chiming in here @da-woods! I appreciate your comments. I agree that there is more complexity around testing here than simply a set and forget single version. At present, RAPIDS typically supports 2 or 3 Python versions at a time. We tend to lag a bit behind NEP 29/SPEC 0 timelines, so we support older versions a bit longer at the expense of not supporting new ones until they've been out for a bit. A significant part of the resource constraint equation for us is certainly on the testing side since running our full test suites on multiple Python versions adds up quickly. The way that I had envisioned this working, if we did move forward, would be that we built on the oldest supported Python (e.g. Py_LIMITED_API=0x03090000) and then we ran tests on the earliest and latest Python we supported (e.g. 3.9 and 3.11). The big benefit of using the limited API in this space would be that we could bump up the latest supported Python version without needing to move the earliest. The assumption would be that by the time a new Python version was released (e.g. 3.12), we would have gone through enough patch release of the previous release (3.11) to trust that nothing would be breaking in future patch releases. Of course, in practice that's probably not true: CPython certainly doesn't always strictly follow SemVer rules for patch releases, and to be fair Hyrum's law certainly applies to a project at that scale. Beyond that Cython's use of CPython internals certainly means that we could be broken even by patch releases. In practice what this would probably mean is that we would run tests as mentioned above on a frequent basis (on every PR), then run a larger test matrix infrequently (say, nightly or weekly). IOW even with limited API builds we would definitely still want to do broader testing to ensure that such builds are actually as compatible as they claim to be. However, I'd hope that the scale of that testing would be reduced.

vyasr commented 8 hours ago

With the latest versions of branch-24.10, which contain a number of changes I made over the past few months for limited API compatibility along with the removal of pyarrow and numpy as build requirements in cudf and cuspatial, most of RAPIDS now builds with the limited API flag on. I have run some smoke tests and things generally work OK, but I haven't done anything extensive. ucxx and kvikio remain outstanding since we need to rewrite the Array class to not use the Python array's C API since that does not support the limited API. The latest tests can be seen in https://github.com/rapidsai/devcontainers/pull/278.

da-woods commented 6 hours ago

I don't know if it's any help, but the quickest non-array.array API way I've found to allocate memory is:

cdef mview_cast = memoryview.cast
cdef Py_ssize_t[::1] new_Py_ssize_t_array(Py_ssize_t n):
    return mview_cast(PyMemoryView_FromObject(PyByteArray_FromStringAndSize(NULL, n* sizeof(Py_ssize_t))), "q")

It isn't as good, but it's surprisingly close given how much it actually does. "Only" 70% slower.

You probably have to replace

mv = PyMemoryView_FromObject(obj)
pybuf = PyMemoryView_GET_BUFFER(mv)

with PyObject_GetBuffer and an appropriate PyBuffer_Release in the destructor. But you probably should be doing that anyway - you're currently keeping a pointer to the data while not retaining a buffer-reference. That means things like bytearray could potentially be resized from under you.