Open ksauzz opened 6 months ago
Hi @ksauzz, you can supply your own custom AWS credential supplier to the library that handles your use case. See here.
I think It doesn't work for spark-bigquery-connector because the connector doesn't the config item to change the supplier. I hope core auth library would have this functionality without any patches by users. Otherwise, GCP users have to make a patch to each google libraries involving google-auth-library-java. Thank you.
Hi @lsirac, it's impossible to use custom awsSecurityCredentialsSupplier
. There is no way to create it with the GoogleCredentials
class from the file.
The issue:
While AwsCredentials class support supplier.
This class is wrapped by ExternalAccountCredentials. When it creates AWS credentials from file_type config, it does not allow specifying awsSecurityCredentialsSupplier
as property.
Besides that, you can write custom ExternalCredentials, cause its list of supported types hardcoded in GoogleCredentials class.
Yep, I agree with @ksauzz that we need to fix this part to specify different suppliers.
Otherwise, when you use a library like BQ Spark connector or hadoop-gcs,
you always need to override AcccesTokenProvider
on the library level, which usually means copying 90% of google-auth-java-lib for AWS but with a different AWS credentials provider.
Because by default, Google lib knows how to take credentials from ENV variable and Ec2Metadata only.
P.S.
Also, I am ready to contribute changes to ExternalAccountCredentials
to allow users to choose whether to use AwsCredentialSource
or awsSecurityCredentialsSupplier
.
But the only way to do it is to create awsSecurityCredentialsSupplier
with reflection.
And this will bring new restrictions to API. Constructor Argument restrictions.
There are two possible restrictions.
awsSecurityCredentialsSupplier
class with the empty constructor. Then, we need an additional function that will take credentialSourceMap
if we want to pass extra arguments to our awsSecurityCredentialsSupplier
implementation. credentialSourceMap
as the argument; in this case, we always create the awsSecurityCredentialsSupplier
class with a static argument.Theoretically, we can support both cases, but this makes API even less clean for me. But flexible.
P.P.S.
Why are extra arguments required?
-> There can be arguments for different authorization mechanisms that AWS supports.
For example, if you want to use the AWS Assume Role feature, awsSecurityCredentialsSupplier
must take the AWS ARN role name to assume.
By the way, in our case environment is AWS EMR-S, and we use it to populate data in BQ and GCS.
InternalAwsSecurityCredentialsSupplier only support environment variables or EC2 metadata server to get AWS credential.
In my usecase, I can't use workload identity federation from AWS Glue (spark) to load data to BigQuery table using spark-bigquery-connector. This spark environment has no EC2 metadata endpoint, and spark driver process' environment variables cannot be updated from a job.
Environment details
AWS Glue 4.0 (spark) + pyspark
Steps to reproduce
External references such as API reference guides
Any additional information below
I think AWS SDKs including aws-sdk-java provide comprehensive ways to get credential from various AWS environments, so it would be nice to use DefaultCredentialsProvider or something instead of custom implementation in this library. But I guess google team wouldn't like to use such other vendor library...
DefaultCredentialsProvider's docs