Fix buffer protocol implementation

pybind / pybind11

Seamless operability between C++11 and Python

https://pybind11.readthedocs.io/

Other

15.8k stars 2.12k forks source link

Fix buffer protocol implementation #5407

Closed QuLogic closed 2 weeks ago

QuLogic commented 1 month ago

Description

According to the buffer protocol, ndim is a required field [1], and should always be set correctly. Additionally, shape should be set if flags includes PyBUF_ND or higher [2]. The current implementation only set those fields if flags was PyBUF_STRIDES.

Note, the tests currently leak, since they don't call PyBuffer_Release, as figuring out how to make a custom deleter work there is not something I've understood. Should I just immediately convert to a SimpleNamespace or similar instead or wrapping the struct? I changed it to a SimpleNamespace.

[1] https://docs.python.org/3/c-api/buffer.html#request-independent-fields [2] https://docs.python.org/3/c-api/buffer.html#shape-strides-suboffsets

Suggested changelog entry:

* Fix buffer protocol implementation

PS, conventional commits are only mentioned in the PR template; they should be mentioned in https://github.com/pybind/pybind11/blob/af67e87393b0f867ccffc2702885eea12de063fc/.github/CONTRIBUTING.md so one knows that before committing...

QuLogic commented 1 month ago

I changed the test to return a SimpleNamespace of the information instead of a pybind11-wrapped struct thing. This way, I can release the buffer and not leak stuff.

QuLogic commented 1 month ago

I still don't know what's wrong with PyPy and GraalPy; I can't get a proper backtrace out of PyPy, and no idea how to install GraalPy.

msimacek commented 1 month ago

I'll have a look at the GraalPy failure

msimacek commented 1 month ago

Ok, so failure is on test-buffers.py:112. What happens is that the struct.unpack_from call requests a buffer with PyBUF_SIMPLE, but you give it a buffer that has ndim = 2, but shape = NULL. GraalPy internally constructs a memoryview of the buffer and memoryview constructor assumes that if ndim > 1 then shape is filled. You'd run into the same problem on CPython if you do PyObject_GetBuffer(obj, &buffer, PyBUF_SIMPLE) and then do PyMemoryView_FromBuffer(buffer). It's going to dereference shape without checking it for NULL at https://github.com/python/cpython/blob/main/Objects/memoryobject.c#L572. In conclusion, setting ndim > 1 without PyBUF_ND flag is invalid. The doc isn't clear on that, but apparently all 3 implementations have that assumption. I think it makes sense if you think about the semantics. If the caller didn't specify they handle multidimensional buffers, you should't give them one, you should make the buffer one-dimensional. So I think you should keep setting ndim = 1 unless PyBUF_ND flag is set. memoryview itself does the same, multidimensional memoryview would return one-dimensional buffer from PyObject_GetBuffer(mv, &buffer, PyBUF_SIMPLE) or error our if it's not C contiguous.

cryos commented 3 weeks ago

I have taken a look at this, and read through the docs too. Whilst it clearly states that ndim must always be set and valid it would seem that the value is incorrectly >1 in certain situations in several implementations. Reading around I also don't see the value in supporting a value of ndim != 1 unless it is PyBUF_ND or higher - what does that mean practically?

It looks like restoring the setting of ndim to 1 by default and moving the setting otherwise to the PyBUF_ND would yield the correct behavior even if it is a little at odds with the documentation. Otherwise it would be good to see examples of PyBUF_SIMPLE with a value other than ndim = 1.

QuLogic commented 3 weeks ago

@msimacek Thanks for the investigation. Reading through your explanation, and the docs again, I think I understand the confusion better now.

From the docs, we have:

The following fields are not influenced by flags and must always be filled in with the correct values: obj, buf, len, itemsize, ndim.

which seems to suggest the values should match the buffer as it exists. So I interpreted the flags to mean "I only need to know this information", and it seems like (other than ndim) that's how pybind11_getbuffer is implemented.

But then looking back at the previous section:

Buffers are usually obtained by sending a buffer request to an exporting object via PyObject_GetBuffer(). Since the complexity of the logical structure of the memory can vary drastically, the consumer uses the flags argument to specify the exact buffer type it can handle.

which indicates that flags is more of a collaboration, i.e., the caller is saying "please give me this type of buffer, or less, but no more complex than that."

So in that sense, the "request-independent fields" should always be correct, but that term is not specified. I read that to be correct "with regards to the underlying buffer", but it is actually correct "as the buffer is exposed", which are not necessarily the same things. For example, a 2D (M, N)-shaped buffer can be exported as an (M*N) 1D buffer if it's contiguous. But then the "request-independent" in the section title is a bit of a misnomer, as ndim does depend on the flags.

What I think that means though is that there is a different underlying bug, because pybind11_getbuffer does not verify that the buffer is contiguous if flags are < PyBUF_STRIDES, but simply exposes the buffer as-is.

cryos commented 3 weeks ago

So in that sense, the "request-independent fields" should always be correct, but that term is not specified. I read that to be correct "with regards to the underlying buffer", but it is actually correct "as the buffer is exposed", which are not necessarily the same things. For example, a 2D (M, N)-shaped buffer can be exported as an (M*N) 1D buffer if it's contiguous. But then the "request-independent" in the section title is a bit of a misnomer, as ndim does depend on the flags.

If shape, strides and suboffsets are all null how would the shape be represented in such a scenario? Doesn't PyBUF_SIMPLE imply contiguous with no field to represent shape in this case?

rwgk commented 3 weeks ago

But then the "request-independent" in the section title is a bit of a misnomer, as ndim does depend on the flags.

Documentation is usually the least reliable part of a software distribution. It was written by humans that probably don't want to spend their time writing documentation and may have omissions and inconsistencies. I'm looking at documentation as useful to get the big picture, but easily misleading when it comes to details. If that happens, it's important to not take the documentation too literally, and instead focus on what works best for the ecosystem.

In this particular case: PyBUF_SIMPLE goes with ndim = 1. I wouldn't look any further than that. Unless someone comes back here later and presents an actual problem.

For now, let's just fix the obvious.

QuLogic commented 2 weeks ago

If shape, strides and suboffsets are all null how would the shape be represented in such a scenario?

It won't be, but that's my point: pybind11 should raise an error if the memory block doesn't fit the requirements of the caller, and it doesn't do so right now.

Doesn't PyBUF_SIMPLE imply contiguous with no field to represent shape in this case?

It implies contiguity of the memory block, but not necessarily the shape of the buffer itself. If the buffer can be reduced to a contiguous 1D buffer, then it is allowed.

Here's an example from NumPy using struct.unpack_from (which @msimacek noted above uses PyBUF_SIMPLE):

>>> a = np.full((3, 5), 30.0)  # Allocate 2D buffer at once.
>>> a.flags.c_contiguous
True
>>> a.size
15
>>> a.shape
(3, 5)
>>> a.strides
(40, 8)

so we have a 2D buffer with a C-contiguous block of memory. In this case, struct.unpack_from works:

>>> struct.unpack_from(f'{a.size}d', a)
(30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0)

If we then take a slice and make the block discontiguous:

>>> b = a[::2, ::2]
>>> b.size
6
>>> b.shape
(2, 3)
>>> b.strides
(80, 16)
>>> b.flags.c_contiguous
False
>>> b.flags.f_contiguous
False
>>> b.flags.contiguous
False

then struct.unpack_from no longer works:

>>> struct.unpack_from(f'{b.size}d', b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: ndarray is not C-contiguous

Unless someone comes back here later and presents an actual problem.

We can replicate something like the above in pybind11:

class Block {
public:
    Block(py::size_t height, py::size_t width) : m_height(height), m_width(width) {
        // Allocate a buffer that is 10 times bigger in each direction.
        m_data.resize(height * 10 * width * 10);
        // Just something to see what's found in memory.
        auto count = 0;
        for (auto &&val : m_data) {
            val = count;
            count++;
        }
    }

    double *get_buffer() {
        return m_data.data();
    }

    py::size_t height() const { return m_height; }
    py::size_t width() const { return m_width; }

private:
    py::size_t m_height;
    py::size_t m_width;
    std::vector<double> m_data;
};

PYBIND11_MODULE(test, m, py::mod_gil_not_used())
{
    py::class_<Block>(m, "Block", py::is_final(), py::buffer_protocol())
        .def(py::init<py::size_t, py::size_t>())
        .def_buffer([](Block &self) -> py::buffer_info {
            std::vector<py::size_t> shape { self.height(), self.width() };
            std::vector<py::size_t> strides { self.width()*10 * 10*sizeof(double), 10*sizeof(double) };
            return py::buffer_info(self.get_buffer(), shape, strides);
        });
}

If we ask NumPy, which obeys strides:

>>> b = Block(3, 5)
>>> np.array(b)
array([[   0.,   10.,   20.,   30.,   40.],
       [ 500.,  510.,  520.,  530.,  540.],
       [1000., 1010., 1020., 1030., 1040.]])

If we ask struct.unpack_from (which asks for a simple buffer):

>>> struct.unpack_from('15d', b)
(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0)

These values are completely wrong, and pybind11 should raise like NumPy.

rwgk commented 2 weeks ago

Please apply Marcus' suggestions.

(Please do not --amend and do not force push.)

In a follow-on commit please implement what you have in mind, ideally with matching test for numpy behavior.

Get Outlook for iOShttps://aka.ms/o0ukef

From: Elliott Sales de Andrade @.> Sent: Friday, November 1, 2024 8:06:45 PM To: pybind/pybind11 @.> Cc: Ralf Grosse Kunstleve @.>; Comment @.> Subject: Re: [pybind/pybind11] Fix buffer protocol implementation (PR #5407)

If shape, strides and suboffsets are all null how would the shape be represented in such a scenario?

It won't be, but that's my point: pybind11 should raise an error if the memory block doesn't fit the requirements of the caller, and it doesn't do so right now.

Doesn't PyBUF_SIMPLE imply contiguous with no field to represent shape in this case?

It implies contiguity of the memory block, but not necessarily the shape of the buffer itself. If the buffer can be reduced to a contiguous 1D buffer, then it is allowed.

Here's an example from NumPy using struct.unpack_from (which @msimacekhttps://github.com/msimacek noted above uses PyBUF_SIMPLE):

a = np.full((3, 5), 30.0) # Allocate 2D buffer at once. a.flags.c_contiguous True a.size 15 a.shape (3, 5) a.strides (40, 8)

so we have a 2D buffer with a C-contiguous block of memory. In this case, struct.unpack_from works:

struct.unpack_from(f'{a.size}d', a) (30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0)

If we then take a slice and make the block discontiguous:

b = a[::2, ::2] b.size 6 b.shape (2, 3) b.strides (80, 16) b.flags.c_contiguous False b.flags.f_contiguous False b.flags.contiguous False

then struct.unpack_from no longer works:

struct.unpack_from(f'{b.size}d', b) Traceback (most recent call last): File "", line 1, in ValueError: ndarray is not C-contiguous

Unless someone comes back here later and presents an actual problem.

We can replicate something like the above in pybind11:

class Block { public: Block(py::size_t height, py::size_t width) : m_height(height), m_width(width) { // Allocate a buffer that is 10 times bigger in each direction. m_data.resize(height 10 width * 10); // Just something to see what's found in memory. auto count = 0; for (auto &&val : m_data) { val = count; count++; } }

double *get_buffer() {
    return m_data.data();
}

py::size_t height() const { return m_height; }
py::size_t width() const { return m_width; }

private: py::size_t m_height; py::size_t m_width; std::vector m_data; };

PYBIND11_MODULE(test, m, py::mod_gil_notused()) { py::class(m, "Block", py::is_final(), py::buffer_protocol()) .def(py::init<py::size_t, py::size_t>()) .def_buffer([](Block &self) -> py::buffer_info { std::vector shape { self.height(), self.width() }; std::vector strides { self.width()10 10sizeof(double), 10sizeof(double) }; return py::buffer_info(self.get_buffer(), shape, strides); }); }

If we ask NumPy, which obeys strides:

b = Block(3, 5) np.array(b) array([[ 0., 10., 20., 30., 40.], [ 500., 510., 520., 530., 540.], [1000., 1010., 1020., 1030., 1040.]])

If we ask struct.unpack_from (which asks for a simple buffer):

struct.unpack_from('15d', b) (0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0)

These values are completely wrong, and pybind11 should raise like NumPy.

— Reply to this email directly, view it on GitHubhttps://github.com/pybind/pybind11/pull/5407#issuecomment-2452825075, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFUZAGDTYD6DB56GMKLBH3Z6Q6ULAVCNFSM6AAAAABPYCSMVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJSHAZDKMBXGU. You are receiving this because you commented.Message ID: @.***>

QuLogic commented 2 weeks ago

I applied the changes separately, but the last commit basically flips the implementation around. Instead of starting minimal and adding more information based on the flags, it starts with all information and pares it back to match the request from the caller. If the underlying buffer cannot be exposed in a simplified form, then an error is raised.

I added tests for C/Fortran/any contiguities, and that the strides mapped into NumPy correctly.

QuLogic commented 2 weeks ago

To compare with NumPy, you can swap:

Matrix(5, 4) with np.empty((5, 4), dtype=np.float32)
FortranMatrix(5, 4) with np.empty((5, 4), dtype=np.float32, order='F')
DiscontiguousMatrix(5, 4, 2, 3) with np.empty((5*2, 4*3), dtype=np.float32)[::2, ::3] Also replace mat.rows() with 5 and mat.cols() with 4 (maybe I should do that so we're always comparing with 'known' values instead of internals?)

Then running the tests show two differences:

for PyBUF_SIMPLE, NumPy returns ndim=0, which I guess is truer in spirit, and we could do here too.
for contiguity mismatches, it raises ValueError instead of BufferError, which is maybe a bug on their part, but close enough to the same.

cryos commented 2 weeks ago

This looks great, and I really appreciate you finding examples @QuLogic that illustrate why the simpler solution is not appropriate. I agree with suggestions from @rwgk, this is a great enhancement and it will really help with corner cases I had not initially considered.