oconnor663 / blake3-py

Python bindings for the BLAKE3 cryptographic hash function
Other
139 stars 12 forks source link

test: memoryview of Apache Arrow buffers (#9) #11

Closed maartenbreddels closed 3 years ago

maartenbreddels commented 3 years ago

This tests the issue mentioned in #9, which gives me


    def test_arrow():
        import pyarrow as pa
        ar = pa.array(['foo', 'bar'])
        blake = blake3()
>       blake.update(memoryview(ar.buffers()[1]))
E       BufferError: Incompatible type as buffer

tests/test_blake3.py:98: BufferError
oconnor663 commented 3 years ago

The failure seems to be at this line, with PyBuffer::get returning an error. I'm not very familiar with the details of Python's buffer protocol, or with whatever quirks there might be in how PyO3 wraps it. I'm also unfamiliar with PyArrow. But as far as I can tell, this is either a bug in PyO3, a bug in PyArrow, or expected behavior?

oconnor663 commented 3 years ago

It looks like the memoryview part might be a red herring. I get the same error if I try to use the buffers object directly.

oconnor663 commented 3 years ago

Ok, putting some debugging into PyO3 clarifies this. The issue is that the PyArrow buffer has a signed char type. You can see that like this:

>>> import pyarrow
>>> ar = pyarrow.array(["foo", "bar"])
>>> memoryview(ar.buffers()[1]).format
'b'

Here's the table of format specifiers. Compare that to regular unsigned char format of a regular bytes object:

>>> memoryview(b"foobar").format
'B'

That leads to a fix. If I replace u8 with i8 in the PyBuffer::get call, then the test passes. (But of course everything else breaks.) Perhaps what we should do is to try to get the buffer as both u8 and i8, and use whichever one succeeds? This assumes that we're ok with pointer casting i8's to u8's...maybe that would have a weird result on hypothetical non-two's-complement machines?

maartenbreddels commented 3 years ago

I think it will be a good approach, in the end, it's just bytes that will be processed, the type information will probably be lost (in the blake library) and nobody should be making any memory copies.

oconnor663 commented 3 years ago

I've pushed beb4e32b0c6a05c99eaf83974de1e24563aab8b2, which includes a fix and some tests, so I'll close out this PR. Thanks for pointing me to this!

maartenbreddels commented 3 years ago

Great, thanks! Are you planning a release soon?

(from mobile phone)

On Sat, 12 Sep 2020, 02:28 oconnor663, notifications@github.com wrote:

I've pushed beb4e32 https://github.com/oconnor663/blake3-py/commit/beb4e32b0c6a05c99eaf83974de1e24563aab8b2, which includes a fix and some tests, so I'll close out this PR. Thanks for pointing me to this!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oconnor663/blake3-py/pull/11#issuecomment-691364370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPEPLN63TVJWPG56HT2ZDSFK6BNANCNFSM4RHJ3UPQ .

oconnor663 commented 3 years ago

Just pushed version 0.1.7 :)