lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 225 forks source link

inverted index with columnar store #1024

Open patelprateek opened 1 year ago

patelprateek commented 1 year ago

Can you provide some details on the design o how inverted index search would be implemented with the columnar format . Also what kind of feature store use cases are suited for lanceDB , typically i am using rocksDB kv store and windering what would be the perf implication with lanceDB

changhiskhan commented 1 year ago

hey @patelprateek in general, the indices we add will map to the row id in the dataset (to take advantage of the random access performance in Lance). When you say inverted index, do you mean for fts? If so, this is currently done in python in lancedb by integrating Tantivy-py. The long term plan is to integrate at the Rust level and bring that into the data format, we'll revisit tantivy to see whether we would need to do additional work on that or not.