lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.79k stars 207 forks source link

Add Spark data source for lance #541

Open LiWeiJie opened 1 year ago

LiWeiJie commented 1 year ago

Hope to support lance in pyspark's spark data source

andrei-ionescu commented 1 year ago

Is there a Scala implementation for Lance format?

changhiskhan commented 1 year ago

Is there a Scala implementation for Lance format?

Not yet. We're still debating how to do it long term. The data will come out as Arrow which has a JVM impl already.

If you're interested in hacking on one let us know! Chang@eto.ai would love to collaborate

zhenyu commented 1 year ago

+1 for the feature!

wjones127 commented 6 months ago

TBD: DataSourceV2 or TableProvider?