lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.95k stars 219 forks source link

If there is a vector search then the output always contains the vector column even if not included in project #1565

Closed westonpace closed 1 year ago

westonpace commented 1 year ago

We need to load the vector column to do the vector search so this isn't really a performance concern. However, as a user, if I specifically list the columns I want, then I probably don't expect there to be extra columns (except _distance).

I've opened this as an issue as I'm not sure if this is expected behavior (in which case we should document it) or not (in which case we should fix it).

I do think there will be situations where users will want the vector search but not the vector column. For example, maybe the vector column is an image embedding and the data they really want is the image itself or a URL for the image.

wjones127 commented 1 year ago

Duplicate of https://github.com/lancedb/lance/issues/1490 ?

westonpace commented 1 year ago

Duplicate of #1490