lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

fix: full text search index broken after optimize_indices() #3145

Closed BubbleCal closed 2 hours ago

BubbleCal commented 1 day ago

this can be reproduced by optimizing FTS without any new data. the existing tokens could be reordered then the offsets broken, leads to the index would return random results. existing tests only verify whether the new data can be queried, so we didn't find this bug. added a test to query existing data

github-actions[bot] commented 1 day ago

ACTION NEEDED Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

codecov-commenter commented 1 day ago

Codecov Report

Attention: Patch coverage is 95.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 77.98%. Comparing base (71f323a) to head (7140241).

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/index.rs 77.77% 0 Missing and 2 partials :warning:
rust/lance-index/src/scalar/inverted/builder.rs 87.50% 0 Missing and 1 partial :warning:
rust/lance/src/index.rs 98.41% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3145 +/- ## ========================================== + Coverage 77.93% 77.98% +0.05% ========================================== Files 242 242 Lines 81736 81798 +62 Branches 81736 81798 +62 ========================================== + Hits 63698 63794 +96 + Misses 14849 14812 -37 - Partials 3189 3192 +3 ``` | [Flag](https://app.codecov.io/gh/lancedb/lance/pull/3145/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/lancedb/lance/pull/3145/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | `77.98% <95.00%> (+0.05%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features: