ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.21k stars 5.81k forks source link

[core] Using pybind11 to replace Cython bindings #46512

Open hongchaodeng opened 4 months ago

hongchaodeng commented 4 months ago

Description

Problem

Currently we use Cython as the glue layer between Python and C++ (core worker) code. This has several problems.

Cython was better suited for creating simple wrapper around C code. But in current architecture, the code is complex and some use cases are beyond Cython design.

Here are some pain points:

Proposal

Proposing to use pybind11 to replace Cython bindings. It has the following benefits:

We can do this incrementally.

Use case

No response

Superskyyy commented 3 weeks ago

The Pybind11 overhead is not neglegible in Ray, how about considering https://github.com/wjakob/nanobind that comes from the author of Pybind11 and claim to deliver 10x less overhead? That way we get benefit from an almost identical syntax with minor overhead. @rynewang @hongchaodeng

Some background https://nanobind.readthedocs.io/en/latest/why.html#why-another-binding-library

rynewang commented 3 weeks ago

cc @dentiny

rynewang commented 3 weeks ago

From a quick glance, this nanobind is similar to pybind11 in architecture, just with some optimizations?

Superskyyy commented 3 weeks ago

From a quick glance, this nanobind is similar to pybind11 in architecture, just with some optimizations?

Exactly, it's the same author, nanobind dropped some historical technical debts so being more performant. And the author suggests using Nanobind instead of Pybind11 unless absolutely needed.

dentiny commented 3 weeks ago

Can I understand as, the biggest gain is to leverage modern C++, since cython leverages C while pybind for C++?

Some bugs will only get caught in the second compilation pass, after Cython has generated thousands of lines of hard-to-decipher code.

For this point, I'm curious how pybind11 helps here? When wrap C++ code via pybind11, we could rely on compilation in one iteration to check; but it's hard to decipher as well, since pybind heavily rely on templates; When wrap (or say, load) python objects via pybind, everything should be at runtime.

I have some bad experience and memory for python/C++ FFI overall, for example,

Just curious, do you think it's possible to leverage localhost network instead of FFI? I definitely understand network call is one order of magnitude slower than FFI, since it involves more copies, but