pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.49k stars 1.98k forks source link

Use AWS Rust SDK To Source Credentials For S3 #19022

Open tustvold opened 1 month ago

tustvold commented 1 month ago

Description

Problem

object_store provides a limited selection of common authentication mechanisms, with a particular focus on those used in server contexts. It does not, however, seek to replicate the entire credentials setup of the various SDKs. Polars is often run on end user devices and therefore people wish for it to support a broader range of authentication options.

object_store exposes a CredentialProvider API that can be used to provide an alternative way to source credentials.

Proposal

https://github.com/pola-rs/polars/issues/18979 tracks exposing CredentialProvider in a way that it can be configured, there are, however, some design questions around what this might look like through a python API.

An alternative would be for polars to provide an option to use aws-sdk-rust to source credentials, much like datafusion-cli does.

Alternatives Considered

Users could use software like aws-vault to generate session credentials, whilst this has other security benefits, for various reasons people may not wish to do this.

We could expose the full CredentialProvider API to users. This would be more flexible, support providers other than AWS, and avoid adding some non-trivial additional dependencies, but requires more design work.

Related Context

alamb commented 1 month ago

BTW here is an example of how we use the AWS SKD to get credentials in datafusion-cli:

https://github.com/apache/datafusion/blob/747001a41481e0cf39dc758a85d1bdb64fdeb7c0/datafusion-cli/src/object_storage.rs#L65-L133