lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

feat!: allow passing down existing dataset for write #3119

Closed wjones127 closed 2 days ago

wjones127 commented 1 week ago

BREAKING CHANGE: return value in Rust of write_fragments() has changed to Result<Transaction>.

codecov-commenter commented 1 week ago

Codecov Report

Attention: Patch coverage is 73.83178% with 140 lines in your changes missing coverage. Please review.

Project coverage is 77.94%. Comparing base (c47543f) to head (60505db). Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/write/insert.rs 73.22% 61 Missing and 7 partials :warning:
rust/lance/src/dataset/write/commit.rs 77.61% 44 Missing and 1 partial :warning:
rust/lance/src/dataset/write.rs 58.69% 18 Missing and 1 partial :warning:
rust/lance/src/dataset.rs 75.75% 4 Missing and 4 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3119 +/- ## ========================================== + Coverage 77.91% 77.94% +0.03% ========================================== Files 240 242 +2 Lines 81465 81738 +273 Branches 81465 81738 +273 ========================================== + Hits 63470 63709 +239 - Misses 14816 14857 +41 + Partials 3179 3172 -7 ``` | [Flag](https://app.codecov.io/gh/lancedb/lance/pull/3119/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/lancedb/lance/pull/3119/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb) | `77.94% <73.83%> (+0.03%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=lancedb#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

wjones127 commented 2 days ago

One other thought I had. Could the insert builder have a way of generating a collection of fragments instead of a transaction? E.g. how would I use it if I wanted to run the insert builder twice and then commit both sets of files as a single transaction later?

😄 Later today I'll have a PR that adds a new API to CommitBuilder: CommitBuilder::execute_batch(transactions: &[Transaction]) -> BatchCommitResult. This will merge all compatible transactions together into a single transaction and commit that. This makes it kind of a nice API for distributed writes, and at the same time serves as a poor-man's multi statement transaction.