lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

feat: allow async stream for writing and appending to a dataset #3146

Open HoKim98 opened 1 day ago

HoKim98 commented 1 day ago

This PR allows end-users to use SendableRecordBatchStream and Schema directly for writing or appending a dataset.

It's vital to write&append async streams to a dataset.

Related Issues

Partially resolves #1792.

Side-effects

This PR has a side-effect like below.

Changed

Added

codecov-commenter commented 1 day ago

Codecov Report

Attention: Patch coverage is 67.39130% with 30 lines in your changes missing coverage. Please review.

Project coverage is 77.92%. Comparing base (1d3b204) to head (082b869).

Files with missing lines Patch % Lines
rust/lance/src/dataset/fragment/write.rs 75.55% 4 Missing and 7 partials :warning:
rust/lance-datafusion/src/utils.rs 68.75% 6 Missing and 4 partials :warning:
rust/lance/src/dataset/write/insert.rs 33.33% 4 Missing :warning:
java/core/lance-jni/src/fragment.rs 0.00% 2 Missing :warning:
rust/lance/src/dataset/write.rs 0.00% 2 Missing :warning:
rust/lance/src/dataset/schema_evolution.rs 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3146 +/- ## ========================================== - Coverage 77.95% 77.92% -0.03% ========================================== Files 242 242 Lines 81904 81900 -4 Branches 81904 81900 -4 ========================================== - Hits 63848 63824 -24 - Misses 14890 14900 +10 - Partials 3166 3176 +10 ``` | [Flag](https://app.codecov.io/gh/lancedb/lance/pull/3146/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/lancedb/lance/pull/3146/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | `77.92% <67.39%> (-0.03%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

HoKim98 commented 1 day ago

Applied StreamingWriteSource and minimized the side-effects.