pypy / pypy

PyPy is a very fast and compliant implementation of the Python language.
https://pypy.org
Other
790 stars 38 forks source link

readonly Py_buffer for those that shouldn't be. #4918

Open milesgranger opened 3 months ago

milesgranger commented 3 months ago

Hi,

(disclaimer, I'm new to PyPy so could very well be missing something obvious)

I'm the maintainer of cramjam, looking into this issue https://github.com/milesgranger/cramjam/issues/142 with draft PR at https://github.com/milesgranger/cramjam/pull/143

In short, cramjam currently, and regrettably, ignores the readonly portion of Py_buffer. (Aiming to change this in v3). So it will happily write to bytes for example, CPython doesn't appear to take issue with this, but PyPy does.

[not directly relevant side-bar]

If I try to maintain this behavior it kinda works. Very strange behavior in that when bytes is used, on this line in the tests, output will remain the original value. (For example b'0' * 23) But when debugging it'll quickly change to the expected value (ie b'\xff\x06\x00\x00sNaPpY\x01\x05\x00\x00\xd2\x8f%I\x00').

Even more entertainment can be discovered if I adapt the test to use a while loop which keeps checking the assertion until eventually output is updated to the expected value.

But okay, that's maybe an issue in and of itself for PyPy as this currently works in CPython.. something strange going on there but I'm less interested in going down that route since it's the wrong thing to do anyhow.

[end side-bar]

I then try to go ahead and do it right by respecting the readonly field of Py_buffer, but I quickly discover types that should give a writable buffer are not, for example bytearray:

In PyPy 3.9:

In [1]: import cramjam

In [2]: b = b'0' * 19

In [3]: memoryview(b).readonly
Out[3]: True

In [4]: cramjam.snappy.compress_into(b'\x00', b)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 cramjam.snappy.compress_into(b'\x00', b)

TypeError: The output buffer 'b'0000000000000000000'' is readonly, refusing to overwrite.

In [5]: b = bytearray(b)

In [6]: memoryview(b).readonly
Out[6]: False

In [7]: cramjam.snappy.compress_into(b'\x00', b)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 cramjam.snappy.compress_into(b'\x00', b)

TypeError: The output buffer 'bytearray(b'0000000000000000000')' is readonly, refusing to overwrite.

But in CPython: this works as expected:

In [1]: import cramjam

In [2]: b = b'0' * 19

In [3]: memoryview(b).readonly
Out[3]: True

In [4]: cramjam.snappy.compress_into(b'\x00', b)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 cramjam.snappy.compress_into(b'\x00', b)

TypeError: The output buffer 'b'0000000000000000000'' is readonly, refusing to overwrite.

In [5]: b = bytearray(b)

In [6]: memoryview(b).readonly
Out[6]: False

In [7]: cramjam.snappy.compress_into(b'\x00', b)
Out[7]: 19

In [8]: b
Out[8]: bytearray(b'\xff\x06\x00\x00sNaPpY\x01\x05\x00\x00\xd2\x8f%I\x00')

Edit:

For clarity, I'm trying to access the buffer via PyObject_GetBuffer(...., PyBUF_CONTIG_RO) which ought to give the correct readonly setting: https://docs.python.org/3/c-api/buffer.html#compound-requests

milesgranger commented 3 months ago

And lastly, I should also note that if I do PyMemoryView_FromObject first and use that result into PyObject_GetBuffer instead, then it will happily say that bytes are writable now as well as bytearray. (Then the weird delayed updating issue comes back for bytes)