lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.74k stars 204 forks source link

Enable S3 Anonymous Mode #2274

Open jaychia opened 4 months ago

jaychia commented 4 months ago

This would allow for access to publicly available datasets without valid S3 credentials: handy for public demos hosted on Google Colab!

I did not see anything available here: https://lancedb.github.io/lance/read_and_write.html#s3-configuration

jaychia commented 4 months ago

Note also that the object_store crate might already support this functionality but under a different key name: https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html#variant.SkipSignature

jaychia commented 4 months ago

Actually, just verified that the skip_signature: true flag works! We should add this to the lance documentation though.

westonpace commented 3 months ago

@jaychia do you want to make a quick PR (actually, 2 PRs probably)? I think https://lancedb.github.io/lance/read_and_write.html#s3-configuration would be the right spot for Lance and https://lancedb.github.io/lancedb/guides/storage/#aws-s3 would be the right spot for it for lancedb. In the Lance docs we should just add it to the list of available options. In the LanceDb docs it might be good to have a small section / paragraph titled "publicly available buckets" that explains that the skip signature flag is required.

The source for those docs is https://github.com/lancedb/lance/blob/main/docs/read_and_write.rst and https://github.com/lancedb/lancedb/blob/main/docs/src/guides/storage.md