[ENH]: Add DLPack casters to pybind11

Skylion007 commented 2 years ago

Required prerequisites

[X] Make sure you've read the documentation. Your issue may be addressed there.
[X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
[X] Consider asking first in the Gitter chat room or in a Discussion.

Problem description

Recently, nanobind added a new type of array caster that uses DLPack to convert to and from array. We would like to backport this caster type to pybind. DLPack is a protocol for quickly and efficiently sharing arrays between programs that are backed by either CPU or GPU memory. For example, with a properly written caster one could transfer between a CUDA GPU Array to a PyTorch Tensor. While this is currently possible with custom caster in PyBind11, it would be nice to have official caster support to make this type of generic array sharing easier. Recent version of Numpy also have native APIs to create Numpy arrays from DLPack.

This is not a replacement for the Eigen / Numpy bindings, but a supplement to them. Our current bindings support versions of Numpy that are fairly old. Additionally, we have a lot of convenience functions for dealing with numpy arrays that DLPack likely will not support (reshaping, resizing etc.
This would be implemented as custom type casters. To use them, the user would need to specify an optional include (similar to how they already need to for Eigen/Numpy Bindings.
The testing suite and much of the code can be ripped from nanobind. Note that there are signficant differences between the type casters though: https://github.com/wjakob/nanobind#api-differences . We should ensure that the ported casters handle errors the pybind11 way.

Skylion007 commented 2 years ago

@oleksandr-pavlyk Are you interested in pursuing this issue? Given the text in PR https://github.com/pybind/pybind11/pull/3866 seems like you already are working on an implementation, it would be nice to have official DLPack casters supported in pybind11.

oleksandr-pavlyk commented 2 years ago

I would love to.

PeterDykas commented 2 years ago

Sorry if I am reading this wrong but it seems like there may have been support added for DLPack but I was struggling to find information on it in the docs. Was the support added for DLPack tensors and is there anywhere you could point me to for finding out more?

oleksandr-pavlyk commented 2 years ago

@PeterDykas The support has been added in nanobind. This issue is to add support for DLPack casters for pybind11 following suite.

PeterDykas commented 2 years ago

Awesome thanks @oleksandr-pavlyk. When this is added it will be very useful! I would love to keep using pybind, is there a workaround that you know of to return DLPack tensors from pybind?

Tabrizian commented 2 years ago

@oleksandr-pavlyk @Skylion007 I would also be interested in working on this if it sounds good to you.

galv commented 2 years ago

FWIW I did create a custom type caster here: https://github.com/nvidia-riva/riva-asrlib-decoder/blob/main/include/riva/asrlib/decoder/pybind11_dlpack_caster.h I am going through legal approval to license it uner pybind11's BSD 3-clause license to I can open a PR.

One immediate observation is that it's quite hard to use DLManagedTensor directly, which is why you would presumably want the nb::tensor abstraction in pybind11.

(Note that I am a colleague of @Tabrizian)

steven-johnson commented 2 years ago

Where does this task stand? Halide would very much like to support DLPack in our Python bindings, but nanobind isn't (yet) a good option for us.

galv commented 2 years ago

Hi @steven-johnson , I am using dlpack for a few things. The first thing was bindings for a CUDA-based beam search decoder for non-autoregressive CTC speech recognition models, for which I wrote a simple type caster here: https://github.com/nvidia-riva/riva-asrlib-decoder/blob/main/include/riva/asrlib/decoder/pybind11_dlpack_caster.h

I noticed a small error in it yesterday while working on a (not-yet-publicly-available) beam search decoder for auto-regressive speech recognition models, like RNN-T and attention-based models. These are more complicated because the decoder must call the neural network model multiple times during decoding, as opposed to just accepting a single output from a neural network model at the beginning. I noticed some errors in my original caster, as well as in pybind11, which prompted the PR here: https://github.com/pybind/pybind11/commit/7c6f2f80a73e330642226138ae3ddaf448a4f672

If you are okay using DLManagedTensor directly, instead of nanobind's batteries-included nb::tensor, take a look at this: https://gist.github.com/galv/4957a972587fbef28e64aeb6b03579ca It is my latest version of this.

The main problem with getting dlpack support in pybind11 is that, if you wanted something like nb::tensor, that is way, way, way more work than what I did. (Trust me, I looked into it by trying to implement it. It would end up becoming a very large undertaking if the powers-that-be wanted something like nb::tensor to consider this issue complete). If you want your C++ bindings to accept buffers from anything implementing the dlpack protocol, this type caster should work, modulo undiscovered bugs.

steven-johnson commented 2 years ago

if you wanted something like nb::tensor

I have only taken a quick skim into nanobind, so it's not clear to me what all is in nb::tensor at this point. I kinda assumed it is something like a numpy ndarray except also with support for the dlpack protocol (in addition to the typical buffer protocol). Is there more to it than that? (I mean, that alone sounds like something I very much want, but I'm not sure if our project is prepared to migrate to nanobind just for that)

galv commented 2 years ago

The nb::tensor interface is kind of confusing to read in the nanobind source code because its mixed in with a lot of implementation details, but this page shows some of what you can do with it: https://github.com/wjakob/nanobind/blob/master/docs/tensor.md

Probably the main useful thing is being able to specify compile time shapes for less confusing code. I know it also always fills in the "strides" field if it happens to be null (allowed by dlpack), which avoids the foot gun of indexing into a NULL strides field. (This is non-trivial to do because something has to own the heap-allocated strides array, and the producer of a dlpack tensor, which provides a destructor, can't do it because it didn't create the strides array itself.)

Meanwhile, the DLManagedTensor interface from dlpack.h is a spartan stable ABI that is best described by reading the one header file that defines it: https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h

wjakob commented 1 year ago

Where does this task stand? Halide would very much like to support DLPack in our Python bindings, but nanobind isn't (yet) a good option for us.

What's missing in nanobind, just out of curiosity? @steven-johnson

pybind / pybind11

[ENH]: Add DLPack casters to pybind11 #3858

Required prerequisites

Problem description