lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.78k stars 207 forks source link

Request for compare/contrast with other solutions #2042

Open billnye2 opened 6 months ago

billnye2 commented 6 months ago

Hi, thanks for your previous help,

Was wondering if you could provide just a high level overview comparing the pros/cons of Lance vs other solutions such as:

After coming to pyarrow, I realized the random-access speed was limiting, so after finding Lance I was surprised it performed well at this. Then I realized Lance looks like a database on disk with the separate data files, transactions, metadata, etc. So after researching a bit, I thought the best solution for random-access without loading it all into RAM would be some sort of array database that is optimized and purpose-built for performing lookups on a supplied index. Thus I found TileDB existed, and there are probably lots of others too. I mean, an array-only db seems very simple compared to postgres or other DBs. This concept must have existed for decades by now, but if not I'd be surprised. Any comparison between Lance and other solutions would be very cool!

Thanks!

wjones127 commented 6 months ago

There's a lot we could say, but to keep it short: