googleapis / google-auth-library-java

Open source Auth client library for Java
https://developers.google.com/identity
BSD 3-Clause "New" or "Revised" License
410 stars 229 forks source link

Workload identity federation doesn't support full aws credential sources. #1408

Open ksauzz opened 6 months ago

ksauzz commented 6 months ago

InternalAwsSecurityCredentialsSupplier only support environment variables or EC2 metadata server to get AWS credential.

In my usecase, I can't use workload identity federation from AWS Glue (spark) to load data to BigQuery table using spark-bigquery-connector. This spark environment has no EC2 metadata endpoint, and spark driver process' environment variables cannot be updated from a job.

Environment details

AWS Glue 4.0 (spark) + pyspark

Steps to reproduce

  1. Prepare workload identity federation settings
  2. run AWS Glue job

External references such as API reference guides

Any additional information below

I think AWS SDKs including aws-sdk-java provide comprehensive ways to get credential from various AWS environments, so it would be nice to use DefaultCredentialsProvider or something instead of custom implementation in this library. But I guess google team wouldn't like to use such other vendor library...

DefaultCredentialsProvider's docs

AWS credentials provider chain that looks for credentials in this order:

  1. Java System Properties - aws.accessKeyId and aws.secretKey
  2. Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  3. Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
  4. Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" environment variable is set and security manager has permission to access the variable,
  5. Instance profile credentials delivered through the Amazon EC2 metadata service
lsirac commented 6 months ago

Hi @ksauzz, you can supply your own custom AWS credential supplier to the library that handles your use case. See here.

ksauzz commented 6 months ago

I think It doesn't work for spark-bigquery-connector because the connector doesn't the config item to change the supplier. I hope core auth library would have this functionality without any patches by users. Otherwise, GCP users have to make a patch to each google libraries involving google-auth-library-java. Thank you.

GrigorievNick commented 1 month ago

Hi @lsirac, it's impossible to use custom awsSecurityCredentialsSupplier. There is no way to create it with the GoogleCredentials class from the file.

The issue: While AwsCredentials class support supplier. This class is wrapped by ExternalAccountCredentials. When it creates AWS credentials from file_type config, it does not allow specifying awsSecurityCredentialsSupplier as property.

Besides that, you can write custom ExternalCredentials, cause its list of supported types hardcoded in GoogleCredentials class.

Yep, I agree with @ksauzz that we need to fix this part to specify different suppliers. Otherwise, when you use a library like BQ Spark connector or hadoop-gcs, you always need to override AcccesTokenProvider on the library level, which usually means copying 90% of google-auth-java-lib for AWS but with a different AWS credentials provider. Because by default, Google lib knows how to take credentials from ENV variable and Ec2Metadata only.

P.S. Also, I am ready to contribute changes to ExternalAccountCredentials to allow users to choose whether to use AwsCredentialSource or awsSecurityCredentialsSupplier. But the only way to do it is to create awsSecurityCredentialsSupplier with reflection. And this will bring new restrictions to API. Constructor Argument restrictions. There are two possible restrictions.

Theoretically, we can support both cases, but this makes API even less clean for me. But flexible.

P.P.S. Why are extra arguments required? -> There can be arguments for different authorization mechanisms that AWS supports. For example, if you want to use the AWS Assume Role feature, awsSecurityCredentialsSupplier must take the AWS ARN role name to assume.

GrigorievNick commented 1 month ago

By the way, in our case environment is AWS EMR-S, and we use it to populate data in BQ and GCS.