Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
We need to load the vector column to do the vector search so this isn't really a performance concern. However, as a user, if I specifically list the columns I want, then I probably don't expect there to be extra columns (except _distance).
I've opened this as an issue as I'm not sure if this is expected behavior (in which case we should document it) or not (in which case we should fix it).
I do think there will be situations where users will want the vector search but not the vector column. For example, maybe the vector column is an image embedding and the data they really want is the image itself or a URL for the image.
We need to load the vector column to do the vector search so this isn't really a performance concern. However, as a user, if I specifically list the columns I want, then I probably don't expect there to be extra columns (except _distance).
I've opened this as an issue as I'm not sure if this is expected behavior (in which case we should document it) or not (in which case we should fix it).
I do think there will be situations where users will want the vector search but not the vector column. For example, maybe the vector column is an image embedding and the data they really want is the image itself or a URL for the image.