lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

feat: add full zip encoding for wide data types #3114

Closed westonpace closed 1 week ago

westonpace commented 1 week ago

The encoding is only tested on tensors for now. It should encode variable-width data but, without a repetition index, we are not yet able to schedule / decode variable width data. In addition, I've created a few todos for follow-up.

codecov-commenter commented 1 week ago

Codecov Report

Attention: Patch coverage is 77.96178% with 173 lines in your changes missing coverage. Please review.

Project coverage is 77.18%. Comparing base (961cd95) to head (7d8a714).

Files with missing lines Patch % Lines
rust/lance-encoding/src/repdef.rs 69.45% 113 Missing :warning:
.../lance-encoding/src/encodings/logical/primitive.rs 82.77% 36 Missing and 10 partials :warning:
...encoding/src/encodings/physical/fixed_size_list.rs 86.27% 5 Missing and 2 partials :warning:
rust/lance-encoding/src/encoder.rs 73.68% 4 Missing and 1 partial :warning:
rust/lance-core/src/utils/bit.rs 96.66% 1 Missing :warning:
rust/lance-encoding/src/decoder.rs 88.88% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3114 +/- ## ========================================== + Coverage 77.15% 77.18% +0.03% ========================================== Files 240 240 Lines 80759 81517 +758 Branches 80759 81517 +758 ========================================== + Hits 62309 62920 +611 - Misses 15278 15385 +107 - Partials 3172 3212 +40 ``` | [Flag](https://app.codecov.io/gh/lancedb/lance/pull/3114/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/lancedb/lance/pull/3114/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | `77.18% <77.96%> (+0.03%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.