milesgranger / cramjam

Your go-to for easy access to a plethora of compression algorithms, all neatly bundled in one simple installation.
MIT License
89 stars 7 forks source link

Support PyPy output of `bytes` and `memoryview` for de/compress_into functions #142

Open mgorny opened 8 months ago

mgorny commented 8 months ago

When running the test suite using PyPy3.10 7.3.15 release, I'm getting lots of test failures. For example:

______________________________ test_obj_api[File] ______________________________
[gw0] linux -- Python 3.10.13 /tmp/cramjam/cramjam-python/.venv/bin/python

tmpdir = local('/tmp/pytest-of-mgorny/pytest-3/popen-gw0/test_obj_api_File_0')
Obj = <class 'File'>

    @pytest.mark.parametrize("Obj", (File, Buffer))
    def test_obj_api(tmpdir, Obj):
        if isinstance(Obj, File):
            buf = File(str(tmpdir.join("file.txt")))
        else:
            buf = Buffer()

        assert buf.write(b"bytes") == 5
        assert buf.tell() == 5
        assert buf.seek(0) == 0
        assert buf.read() == b"bytes"
        assert buf.seek(-1, 2) == 4  # set one byte backwards from end; position 4
        assert buf.read() == b"s"
        assert buf.seek(-2, whence=1) == 3  # set two bytes from current (end): position 3
        assert buf.read() == b"es"

        with pytest.raises(ValueError):
            buf.seek(1, 3)  # only 0, 1, 2 are valid seek from positions

        for out in (
            b"12345",
            bytearray(b"12345"),
            File(str(tmpdir.join("test.txt"))),
            Buffer(),
        ):
            buf.seek(0)

            expected = b"bytes"

            buf.readinto(out)

            # Will update the output buffer
            if isinstance(out, (File, Buffer)):
                out.seek(0)
                assert out.read() == expected
            elif isinstance(out, bytearray):
                assert out == bytearray(expected)
            else:
>               assert out == expected
E               AssertionError: assert b'12345' == b'bytes'
E                 
E                 At index 0 diff: b'1' != b'b'
E                 Use -v to get more diff

tests/test_rust_io.py:44: AssertionError
____________________ test_variants_different_dtypes[snappy] ____________________
[gw0] linux -- Python 3.10.13 /tmp/cramjam/cramjam-python/.venv/bin/python

variant_str = 'snappy'

    @pytest.mark.parametrize("variant_str", VARIANTS)
>   @given(arr=st_np.arrays(st_np.scalar_dtypes(), shape=st.integers(0, int(1e4))))

tests/test_variants.py:42: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

variant_str = 'snappy', arr = array([], shape=(2, 0), dtype=bool)

    @pytest.mark.parametrize("variant_str", VARIANTS)
    @given(arr=st_np.arrays(st_np.scalar_dtypes(), shape=st.integers(0, int(1e4))))
    def test_variants_different_dtypes(variant_str, arr):
        variant = getattr(cramjam, variant_str)
        compressed = variant.compress(arr)
        decompressed = variant.decompress(compressed)
        assert same_same(bytes(decompressed), arr.tobytes())

        # And compress n dims > 1
        if arr.shape[0] % 2 == 0:
            arr = arr.reshape((2, -1))
>           compressed = variant.compress(arr)
E           TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
E           - variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'ndarray' object cannot be converted to 'Buffer'
E           - variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'ndarray' object cannot be converted to 'File'
E           - variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: Buffer is not C contiguous
E           Falsifying example: test_variants_different_dtypes(
E               variant_str='snappy',
E               arr=array([], dtype=bool),
E           )

tests/test_variants.py:52: TypeError

They all look quite serious. This is with 2b90ebbf2c85be6e47248fd482590335aa245bc4.

To reproduce, using pypy3.10 venv:

pip install . pytest pytest-xdist hypothesis numpy
python -m pytest -n auto tests

Full test log (1.3M): test.txt

milesgranger commented 8 months ago

Thanks for the reports, clearly not many people use cramjam w/ pypy. There's #116 issue w/ pypy and I think that's all I've gotten. Nonetheless, it ought to be fixed. I'll try to carve out some time to see what might be going wrong.

mgorny commented 8 months ago

PyPy3.8 is no longer developed, so I don't think you need to worry about that. The current versions are 3.9 and 3.10, and upstream has made some nice improvements. They are also very helpful, and if you believe it's a bug in PyPy they're willing to try fixing it.

milesgranger commented 8 months ago

We'll see what happens on the PyPy side, but out of curiosity I used Google BigQuery to see what the pypy vs non-pypy activity we have and see PyPy is indeed not used very much comparatively. I may end up dropping it in future releases until we can get it straightened out; depending how long it takes.

image

mgorny commented 8 months ago

Just for the record, I don't think that data has much value. It counts only direct wheel downloads. It is missing all the use from Linux distributions that fetch sdist and build their own binary packages. On top of that, python-snappy just replaced their use of Snappy with cramjam — that's why we suddenly ended up packaging cramjam in Gentoo, and kinda needing PyPy3 support in it.

martindurant commented 8 months ago

Sorry, @milesgranger , we might have doubled your username overnight

milesgranger commented 8 months ago

Okay, I'm tracking now, thanks! I have #144 then in the meantime while waiting with https://github.com/pypy/pypy/issues/4918.

mgorny commented 8 months ago

Thank you for looking into this. I've subscribed to the PyPy bug as well now.

milesgranger commented 8 months ago

The temporary 'fix' is on PyPI now with v2.8.2

mgorny commented 8 months ago

Thanks a lot! (the tests pass for me now)