lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

Support multiple columns (composite key) for `merge_insert` #3124

Open westonpace opened 1 week ago

westonpace commented 1 week ago

Example scenario: A document is chunked into paragraphs and each paragraph is embedded and the row contains the document_id and the paragraph_id. Later, the user recalculates the embedding for one of the documents and wants to replace the rows.

westonpace commented 1 week ago

Adding merge_insert support isn't too bad. However, for performance reasons, we may also want to tackle #3125