sgkit-dev / bio2zarr

Convert bioinformatics file formats to Zarr
Apache License 2.0
28 stars 7 forks source link

Numcodecs v0.13.0 causing test failures #274

Closed Will-Tyler closed 4 months ago

Will-Tyler commented 4 months ago

Overview

Changes introduced in version 0.13.0 of the numcodecs dependency are causing some of the unit tests to fail. The version was released on July 12, 2024. The release notes describe changes to several compression algorithms, including zstd, which vcf2zarr uses by default for the intermediate columnar format. Using version 0.12.1 causes fixes the failing unit tests.

This problem was discovered in #273.

Test failure summary

FAILED tests/test_icf.py::TestCorruptionDetection::test_truncated_chunk_file[190] - assert 187 == 190
 +  where 187 = len(b"\x02\x01\x90\x01\xc5\x00\x00\x00\xc5\x00\x00\x00\xbb\x00\x00\x00\x14\x00\x00\x00\xa3\x00\x00\x00(\xb5/\xfd \xc5\xd5\...\xcb\x92\xae<D\xcdR\xf8\x02\x0c\x00\xc0\x90\xa4\xc0\xbd\x82q\x16,w\x07Q.\x80\x03(\xa8)Z\x15\x9cQ\xa3\xcd\xd4-\x03x\x03")
FAILED tests/test_icf.py::TestCorruptionDetection::test_truncated_chunk_file[192] - assert 187 == 192
 +  where 187 = len(b"\x02\x01\x90\x01\xc5\x00\x00\x00\xc5\x00\x00\x00\xbb\x00\x00\x00\x14\x00\x00\x00\xa3\x00\x00\x00(\xb5/\xfd \xc5\xd5\...\xcb\x92\xae<D\xcdR\xf8\x02\x0c\x00\xc0\x90\xa4\xc0\xbd\x82q\x16,w\x07Q.\x80\x03(\xa8)Z\x15\x9cQ\xa3\xcd\xd4-\x03x\x03")
FAILED tests/test_vcf_examples.py::test_split_explode - AssertionError: assert {'compressed_...lue': 10, ...} == {'compressed_...lue': 10, ...}

  Omitting 5 identical items, use -vv to show
  Differing items:
  {'compressed_size': 571} != {'compressed_size': 587}

  Full diff:
    {
  -     'compressed_size': 587,
  ?                         -
  +     'compressed_size': 571,
  ?                          +
        'max_number': 1,
        'max_value': 1235237,
        'min_value': 10,
        'num_chunks': 3,
        'uncompressed_size': 1008,
    }

Temporary workaround

To get the test suite to work, setting the version explicitly to 0.12.1 in pyproject.toml works.

References

benjeffery commented 4 months ago

Thanks for the detailed report - should be fixed in #275