lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 226 forks source link

Size-based eviction for index cache #3136

Open wjones127 opened 6 days ago

wjones127 commented 6 days ago

Users can set number of items, but that's hard to tune without knowing a lot of internal details about how large metadata is. Easier to instead set the size in bytes.

TBD: how difficult is this to implement?

We should have a deprecation cycle for the old item-based size. It can assume a fixed-size entry (___ MB?) and use that to derive a value for the max bytes.