lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.85k stars 212 forks source link

Support PutIfNotExist commit handler for R2 and Minio #2246

Open wjones127 opened 5 months ago

wjones127 commented 5 months ago

While S3 infamously doesn't support any atomic put_if_not_exists or similar, we have to have some supplemental commit mechanism. However, there are S3-compatible stores that do support such an operation: R2 and Minio. It would be cool to support concurrent writes out-of-the-box for these backends.

We should be able to detect whether a user is pointing at one of those services (or just let them tell us). Then we can switch to this other commit handler.

See: https://docs.rs/object_store/0.9.1/object_store/aws/enum.S3ConditionalPut.html

rakeshJn commented 1 day ago

Looks like IBM Cloud Object Store also supports conditional writes: https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-upload#upload-conditional It would be nice to have this feature available, without which it is quite limited.