Open Skylion007 opened 2 years ago
@oleksandr-pavlyk Are you interested in pursuing this issue? Given the text in PR https://github.com/pybind/pybind11/pull/3866 seems like you already are working on an implementation, it would be nice to have official DLPack casters supported in pybind11.
I would love to.
Sorry if I am reading this wrong but it seems like there may have been support added for DLPack but I was struggling to find information on it in the docs. Was the support added for DLPack tensors and is there anywhere you could point me to for finding out more?
@PeterDykas The support has been added in nanobind. This issue is to add support for DLPack casters for pybind11 following suite.
Awesome thanks @oleksandr-pavlyk. When this is added it will be very useful! I would love to keep using pybind, is there a workaround that you know of to return DLPack tensors from pybind?
@oleksandr-pavlyk @Skylion007 I would also be interested in working on this if it sounds good to you.
FWIW I did create a custom type caster here: https://github.com/nvidia-riva/riva-asrlib-decoder/blob/main/include/riva/asrlib/decoder/pybind11_dlpack_caster.h I am going through legal approval to license it uner pybind11's BSD 3-clause license to I can open a PR.
One immediate observation is that it's quite hard to use DLManagedTensor directly, which is why you would presumably want the nb::tensor abstraction in pybind11.
(Note that I am a colleague of @Tabrizian)
Where does this task stand? Halide would very much like to support DLPack in our Python bindings, but nanobind isn't (yet) a good option for us.
Hi @steven-johnson , I am using dlpack for a few things. The first thing was bindings for a CUDA-based beam search decoder for non-autoregressive CTC speech recognition models, for which I wrote a simple type caster here: https://github.com/nvidia-riva/riva-asrlib-decoder/blob/main/include/riva/asrlib/decoder/pybind11_dlpack_caster.h
I noticed a small error in it yesterday while working on a (not-yet-publicly-available) beam search decoder for auto-regressive speech recognition models, like RNN-T and attention-based models. These are more complicated because the decoder must call the neural network model multiple times during decoding, as opposed to just accepting a single output from a neural network model at the beginning. I noticed some errors in my original caster, as well as in pybind11, which prompted the PR here: https://github.com/pybind/pybind11/commit/7c6f2f80a73e330642226138ae3ddaf448a4f672
If you are okay using DLManagedTensor directly, instead of nanobind's batteries-included nb::tensor, take a look at this: https://gist.github.com/galv/4957a972587fbef28e64aeb6b03579ca It is my latest version of this.
The main problem with getting dlpack support in pybind11 is that, if you wanted something like nb::tensor, that is way, way, way more work than what I did. (Trust me, I looked into it by trying to implement it. It would end up becoming a very large undertaking if the powers-that-be wanted something like nb::tensor to consider this issue complete). If you want your C++ bindings to accept buffers from anything implementing the dlpack protocol, this type caster should work, modulo undiscovered bugs.
if you wanted something like nb::tensor
I have only taken a quick skim into nanobind, so it's not clear to me what all is in nb::tensor
at this point. I kinda assumed it is something like a numpy ndarray
except also with support for the dlpack protocol (in addition to the typical buffer protocol). Is there more to it than that? (I mean, that alone sounds like something I very much want, but I'm not sure if our project is prepared to migrate to nanobind just for that)
The nb::tensor interface is kind of confusing to read in the nanobind source code because its mixed in with a lot of implementation details, but this page shows some of what you can do with it: https://github.com/wjakob/nanobind/blob/master/docs/tensor.md
Probably the main useful thing is being able to specify compile time shapes for less confusing code. I know it also always fills in the "strides" field if it happens to be null (allowed by dlpack), which avoids the foot gun of indexing into a NULL strides field. (This is non-trivial to do because something has to own the heap-allocated strides array, and the producer of a dlpack tensor, which provides a destructor, can't do it because it didn't create the strides array itself.)
Meanwhile, the DLManagedTensor interface from dlpack.h is a spartan stable ABI that is best described by reading the one header file that defines it: https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h
Where does this task stand? Halide would very much like to support DLPack in our Python bindings, but nanobind isn't (yet) a good option for us.
What's missing in nanobind, just out of curiosity? @steven-johnson
Required prerequisites
Problem description