lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

Size-based eviction for file metadata cache #3135

Open wjones127 opened 3 days ago

wjones127 commented 3 days ago

Users can set number of items, but that's hard to tune without knowing a lot of internal details about how large metadata is. Easier to instead set the size in bytes.

We already require impl DeepSizeOf, so it should be very little effort to make eviction size based.

We should have a deprecation cycle for the old item-based size. It can assume a fixed-size entry (2MB?) and use that to derive a value for the max bytes.