trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.46k stars 3.01k forks source link

Fallback to default AWS Credentials when using trino.s3.credentials-provider #23703

Open MHarmony opened 1 month ago

MHarmony commented 1 month ago

We are using trino/hive via EMR and one of the s3 buckets we are hitting required us to create a custom credentials provider to retrieve credentials using some endpoints the owner of the bucket has setup. The provider is working (it implements AWSSessionCredentialsProvider and returns AWSSessionCredentials), however, it is being used for EVERY request to s3. We have 2 other buckets that we are hitting as well that we didn't have to do any configuration for, as they are owned by the same account that runs our EMR cluster... which just use the EMR-attached IAM role I assume?

How can we modify our config to differentiate between the 2 sets of buckets and for 1 set, use the custom provider, and the other set use the default IAM role? Can trino.s3.credentials-provider be set per bucket? Or do we have to modify the provider somehow to parse the URI and then return other credential types?

I found https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/InstanceProfileCredentialsProvider.html but I'm not sure if this is what is used by default.

mosabua commented 1 month ago

@electrum might know better, but I think you probably have to create two separate catalogs

hashhar commented 1 month ago

We should probably add this to S3 Security Mapping IMO.

MHarmony commented 1 month ago

We found a solution to our issue. We check the URI and if it's the bucket that needs the custom credentials then we handle that... otherwise we return the credentials from a DefaultAWSCredentialsProviderChain object.

I think the method mentioned by @hashhar would be the most foolproof moving forward.

MHarmony commented 1 month ago

Also I think it's vital to add to the documentation that the custom provider must be deployed on workers as well as the coordinator.