lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

feat: fsst compression with mini-block #3121

Closed broccoliSpicy closed 1 week ago

broccoliSpicy commented 1 week ago

This PR tries to integrate mini-block page layout with FSST compression.

During compression, it first FSST compresses the input data then write out the data use BinaryMiniBlockEncoder. During decompression, it first uses BinaryMiniBlockDecompressor to decode the raw data read from disk, it then applies FSST decompression.

github-actions[bot] commented 1 week ago

ACTION NEEDED Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

codecov-commenter commented 1 week ago

Codecov Report

Attention: Patch coverage is 87.71930% with 14 lines in your changes missing coverage. Please review.

Project coverage is 77.87%. Comparing base (ec76db4) to head (2397c19).

Files with missing lines Patch % Lines
rust/lance-encoding/src/encodings/physical/fsst.rs 85.22% 9 Missing and 4 partials :warning:
rust/lance-encoding/src/encodings/physical.rs 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3121 +/- ## ========================================== + Coverage 77.19% 77.87% +0.68% ========================================== Files 240 240 Lines 81517 81630 +113 Branches 81517 81630 +113 ========================================== + Hits 62927 63572 +645 + Misses 15383 14831 -552 - Partials 3207 3227 +20 ``` | [Flag](https://app.codecov.io/gh/lancedb/lance/pull/3121/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/lancedb/lance/pull/3121/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | `77.87% <87.71%> (+0.68%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.