Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Remove the queue in LanceArrowWriter since it may cache all rows in queue and that will require a lot of jvm memory.
Use mutex to control the write rate of sinker. Writer will wait util the reader take the batch.
And more I had moved the maven-shade-plugin into a new profile which is diabled by default because jar-with-dependencie was conflict with many jars in spark dependencie
Remove the queue in LanceArrowWriter since it may cache all rows in queue and that will require a lot of jvm memory.
Use mutex to control the write rate of sinker. Writer will wait util the reader take the batch.
And more I had moved the
maven-shade-plugin
into a new profile which is diabled by default becausejar-with-dependencie
was conflict with many jars in spark dependencie