Open hutch3232 opened 2 months ago
I doubt Polars has control over object_store
feature additions. I suggest you raise this request in their repo.
Oh, I somehow didn't realize they were separate libraries. Looks like it used to be experimentally supported but that support was dropped. Bummer.
https://github.com/apache/arrow-rs/pull/4238 https://github.com/apache/arrow-rs/issues/4556
Yikes. It looks like there's no easy way to get support for AWS profiles in polars, then. That's a big lack of functionality on the object_store
package. My only workaround, then, is pl.read_parquet(..., use_pyarrow=True)
.
:wave: object_store maintainer here. The major challenge with supporting AWS_PROFILE is the sheer scope of such an initiative, even the official Rust AWS SDK continues to have issues in this space (https://github.com/awslabs/aws-sdk-rust/issues/1193). Whilst we did at one point support AWS_PROFILE in object_store, it was tacked on and lead to surprising inconsistencies for users as only some of the configuration would be respected. We do not use SDKs as this allows for a more consistent experience across stores, especially since AWS is the only official one, along with a significantly smaller dependency footprint. There is more information on https://github.com/apache/arrow-rs/issues/2176.
This support for AWS_PROFILE was therefore removed and replaced with a more flexible API allowing users and system integrators to configure how to source credentials from their environment. I have filed https://github.com/pola-rs/polars/issues/18979 to suggest exposing this in polars.
Edit: As an aside I would strongly encourage using aws-vault to generate session credentials, as not only would it avoid this class of issue, but avoids storing credentials in plain text on the filesystem and relying on individual apps/tools to use the correct profile.
One interesting thing I just realized is that pl.read_csv
actually accepts the "profile" input to storage_options
. That's surprising considering pl.read_parquet
does not.
Edit: tested polars
1.8.2
Edit2: in fact, pl.read_csv
can pick up AWS_PROFILE
and even AWS_ENDPOINT_URL
(see: #18758)
Description
I have a variety of different AWS/S3 profiles in my
~/.aws/credentials
and~/.aws/config
files. I'd like to be able to either explicitly passprofile
intostorage_options
or implicitly by setting anAWS_PROFILE
environmental variable so that I can be sure to use the appropriate bucket keys/endpoint/and other configs.I saw here that profile is not listed as a supported option: https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html
polars
seems to use the first profile listed in those~/.aws
files, even if the profile name is not 'default'. By ensuring the relevant profile was listed first,pl.read_parquet("s3://my-bucket/my-parquet/*.parquet")
would work, but being order-dependent is confusing and not scalable.FWIW this functionality exists in
pandas
and I'm hoping to migrate code topolars
, but this is kind of essential.