Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Users can set number of items, but that's hard to tune without knowing a lot of internal details about how large metadata is. Easier to instead set the size in bytes.
TBD: how difficult is this to implement?
We should have a deprecation cycle for the old item-based size. It can assume a fixed-size entry (___ MB?) and use that to derive a value for the max bytes.
Users can set number of items, but that's hard to tune without knowing a lot of internal details about how large metadata is. Easier to instead set the size in bytes.
TBD: how difficult is this to implement?
We should have a deprecation cycle for the old item-based size. It can assume a fixed-size entry (___ MB?) and use that to derive a value for the max bytes.