xtensor-stack / xtensor-python

Python bindings for xtensor
BSD 3-Clause "New" or "Revised" License
345 stars 58 forks source link

Is this a valid way of interfacing xtensor with python? #254

Open Galfurian opened 3 years ago

Galfurian commented 3 years ago

This is a question/idea concerning the xtensor/python wrapping.

I was wondering if the following way I'm currently interfacing xt::xarray to numpy arrays in python is somehow valid and not an abomination. I'm writing down here the ideas, but I also wrote a self-contained and completely commented "example" project attached to this issue.

What this wrapping method allowed me to do:

Full Example: pytensor.zip

C++ to Python

First I had to take care of transforming xtensor strides to numpy strides:

template <typename T>
inline typename xt::xarray<T>::strides_type strides_to_pystrides(typename xt::xarray<T>::strides_type strides)
{
    for (auto &stride : strides)
        stride *= sizeof(T);
    return strides;
}

Once I had a way of adapting the strides, I transformed the xarray into a pybind11::array_t, as follows:

template <typename T>
inline typename pybind11::array_t<T> array_to_pyarray(typename xt::xarray<T> &src)
{
    return pybind11::memoryview::from_buffer(
        src.data(),                             // buffer
        src.shape(),                            // shape
        strides_to_pystrides<T>(src.strides()), // strides
        false                                   // readonly
    );
}

I've tested, and the changes made in python are correctly registered in C++. The only thing I know I must avoid is overriding the array with itself in python, otherwise, it will be a mess (more on this inside the attached full example).

A this point I thought, maybe I can do the same with views? So I wrote the following function:

template <typename T>
inline pybind11::array_t<T> view_to_pyarray(typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &view)
{
    // Return a memory view, allowing changes from python.
    return pybind11::memoryview::from_buffer(
        view.data() + view.data_offset(),       // buffer
        view.shape(),                           // shape
        strides_to_pystrides<T>(view.strides()) // strides
    );
}

I mean, that's all from this side.

Python to C++

The interfacing in the other direction is done in a similar fashion.

I needed some "adaptation" functions.

First, for the shape:

template <typename T>
inline typename xt::xarray<T>::shape_type pyinfo_to_shape(const pybind11::buffer_info &info)
{
    size_t ndim = static_cast<size_t>(info.ndim);
    typename xt::xarray<T>::shape_type shape(ndim);
    for (size_t i = 0; i < ndim; ++i)
        shape[i] = info.shape[i];
    return shape;
}

Second, for the strides:

template <typename T>
inline typename xt::xarray<T>::strides_type pyinfo_to_strides(const pybind11::buffer_info &info)
{
    typename xt::xarray<T>::strides_type strides;
    for (size_t i = 0; i < info.strides.size(); ++i)
        strides.push_back(info.strides[i] / info.itemsize);
    return strides;
}

Once I had both those adaptation function, I could easily modify the xt::xarray:


template <typename T>
inline xt::xarray<T> &pyarray_to_array(const pybind11::array_t<T> &src, xt::xarray<T> &dst)
{
    // Get the infos regarding the python array.
    pybind11::buffer_info info = src.request();
    // Transform the python shape.
    auto shape = pyinfo_to_shape<T>(info);
    // Transforms the buffer size to unsigned.
    auto size = static_cast<size_t>(info.size);

    // Re-initialize the xarray if the shape is different.
    if (dst.shape() != shape)
        dst = xt::xarray<T>(shape);

    // Get the pointers to the buffers.
    T *dst_ptr = static_cast<T *>(dst.data()), *src_ptr = static_cast<T *>(info.ptr);

    // Copy the data.
    for (size_t i = 0; i < size; ++i)
        dst_ptr[i] = src_ptr[i];
    return dst;
}

Then, I've done the same for the views:

template <typename T>
inline typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &pyarray_to_view(
    const pybind11::array_t<T> &src,
    typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &dst)
{
    // Get the infos regarding the python array.
    pybind11::buffer_info info = src.request();
    // Transform the python shape.
    auto shape = pyinfo_to_shape<T>(info);
    // Transforms the buffer size to unsigned.
    auto size = static_cast<size_t>(info.size);

    if (shape != dst.shape())
        throw std::runtime_error("Shapes must match");
    if (size != dst.size())
        throw std::runtime_error("Sizes must match");

    // Get the pointers to the buffers.
    T *dst_ptr = static_cast<T *>(dst.data() + dst.data_offset()), *src_ptr = static_cast<T *>(info.ptr);

    // Copy the data.
    for (size_t i = 0; i < size; ++i)
        dst_ptr[i] = src_ptr[i];
    return dst;
}

Error To Avoid

Let us take an extract of the example I attached to the issue. Here is the C++ registration:

/// @brief Just an example of dataset containing xtensor arrays and views.
class DataSet {
public:
    /// A set of arrays.
    xt::xarray<double> x1;
    DataSet() : x1{{9, 8}, {7, 6}, {5, 4}, {3, 2}} {
        // Nothing to do.
    }
};
PYBIND11_MODULE(pytensor, m) {
    pybind11::class_<DataSet>(m, "DataSet")
        .def(pybind11::init<>())
        .def_property("x1",
            [](DataSet &self) { return array_to_pyarray(self.x1); },
            [](DataSet &self, const pybind11::array_t<double> &m) { pyarray_to_array(m, self.x1); });
}

In python I need to avoid overriding the same array with itself:

from pytensor import DataSet
# Create the dataset.
ds = DataSet()
# Reshape x1 and override it.
print("ds.x1    :\n", ds.x1, "\n")
ds.x1 = ds.x1.reshape(2, 2, 2)
print("ds.x1    :\n", ds.x1, "\n")

This will output:

ds.x1    :
 [[9. 8.]
 [7. 6.]
 [5. 4.]
 [3. 2.]] 
ds.x1    :
 [[[0.000000e+000 9.381848e-317]
  [7.000000e+000 6.000000e+000]]

 [[5.000000e+000 4.000000e+000]
  [3.000000e+000 2.000000e+000]]] 

So except for this, it is behaving nicely the binding. I can also provided a solution to this problem:

from copy import copy
...
ds.x1 = copy(ds.x1.reshape(2, 2, 2))

Meh.. this requires a copy, is not that nice but it works.

Conclusion

I'm no expert, I started working with xtensor a couple of months ago.

I needed to interface something that was already using xt::xtensor to python, without changing everything to xt::pyarray. I'm lazy, and also I didn't know how to make all the strided_view inside the code work with the xt::pyarray.

I might be doing something nasty with this code, but I don't know all the meticulous details of how xtensor and python are actually handling the memory. I only know that both numpy and xtensor use the row-major ordered arrays and I'm hoping that's enough.

Thank you very much for reading through my explanation.

Cheers,

Enrico

JohanMabille commented 3 years ago

Hi,

Thanks for this very detailed explanation. xtensor-python already provides C++ bindings for Numpy arrays, which can operates in place: pyarray (for dynamic number of dimensions) and pytensor (for static number of dimensions). They handle for you the conversion of strides and the auto assign problem, and any change from the python side will reflect on the C++ side (and vice versa).

#include <xtensor-python/pyarray.hpp>

pyarray<double> do_something(const pyarray<double>& arg)
{
    // ...
    // Will print the strides in the 'xtensor format' (i.e. strides
    // in number of elements instead of number of bytes).
    std::cout << xt::adapt(arg.strides()) << std::endl;
    // Fine:
    arg = arg;

    return arg + 1.;
}

PYBIND11_MODULE(pytensor, m)
{
    m.def("do_something", do_something);
}
a = np.array([1., 2.], [3., 4.])
b = do_sometihng(a)
print(b)

This should print the strides of a (from the C++ side) and b which is [[2., 3.], [4., 5]].

The main drawback is that pyarray is different form xarray. You can assign a pyarray object to an xarray object (and vice versa) but this results in a copy. Also we don't have bindings for views. Which makes me think we could implement your idea of a memory view of a buffer in-memory and have a pyadaptor, that would allow to directly provide xtensor structures to Python. That would actually be awesome!

Is that something you would like to implement?

Galfurian commented 3 years ago

Thank you very much for the reply.

I get the usefulness of both pyarray and pytensor compared to a more `handmade'' solution, and I know thatxt::adapt` is quite useful (like a swiss army knife).

Anyway, I will gladly investigate how to write a pyadaptor. I might need some pointers to start with, and I will definitely need to check your coding guidelines.

Regards,

Enrico

ThibHlln commented 1 year ago

Hi @JohanMabille,

When you said:

Also we don't have bindings for views.

in https://github.com/xtensor-stack/xtensor-python/issues/254#issuecomment-817560282

Did you mean for numpy views or xtensor views ?

I am experiencing some strange behaviour when I pass numpy views (resulting from some array indexing, e.g. arr[:, 99:]) to my bindings function (expecting a pytensor), and I am wondering whether this is because you don't have bindings for numpy views ? Or if passing numpy views for pytensor parameters should work ?

Thanks.

peter-urban commented 1 year ago

Hi @JohanMabille and @Galfurian

Which makes me think we could implement your idea of a memory view of a buffer in-memory and have a pyadaptor, that would allow to directly provide xtensor structures to Python.

I would be incredibly interested in a pyadaptor that can take convert numpy arrays to xtensors/xarrays without copying. Did anything happen regarding this feature? If not: Do you have some pointers on how to start implementing this? E.g. Is there an adaptor in the xtensor-stack that could be used as a template?