Closed JorTurFer closed 10 months ago
It would still be useful if we could assume other roles from the keda operator, could we have a roleArn
attribute somewhere in TriggerAuthentication
, which the KEDA operator could then assume?
Do you mean something like this in azure?
We did it to prevent cases where the same identity (role in aws terms) stack a lot of permissions, there you can federate multiple identities with the same k8s service account and KEDA takes one or another based on the TriggerAuthentication
It should be doable if you meant that, I'm not 100% sure, but it's just to give a try.
Are you willing to contribute?
I'm not familiar with Azure but I think that's the same concept. The KEDA service account would have an IAM role with a policy which would allow it to take on (assume) other roles, like the example below.
{
"Version": "2012-10-17",
"Statement": [{
"Action": "sts:AssumeRole",
"Effect": "Allow"
"Resource": "arn:aws:iam::*:role/grant-keda-access"
}]
}
If we were to have a awsRoleArn
in the TriggerAuthentication
CRD, or by using the awsRoleArn
that's available in the various scalers, the KEDA operator would then assume the given role, as long as the appropriate assume role policy has been granted by the awsRoleArn
(something like this in Terraform syntax)
resource "aws_iam_role" "give-keda-access" {
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::$my-account-id:role/my-keda-operator-role"
}
},
]
})
name = "grant-keda-access"
}
I took a look at the code and it seems like this should work, but never really got it to work without doing what you were describing (by defining a service account name etc, which seems unnecessary). It seems like it should be enough to remove all references to identityOwner
and to modify getAwsConfig
to be something like this:
func getAwsConfig(awsRegion string, awsEndpoint string, awsAuthorization awsAuthorizationMetadata) (*session.Session, *aws.Config) {
metadata := &awsConfigMetadata{
awsRegion: awsRegion,
awsEndpoint: awsEndpoint,
awsAuthorization: awsAuthorization}
sess := session.Must(session.NewSession(&aws.Config{
Region: aws.String(metadata.awsRegion),
Endpoint: aws.String(metadata.awsEndpoint),
}))
var creds *credentials.Credentials
if metadata.awsAuthorization.awsRoleArn != "" {
creds = stscreds.NewCredentials(sess, metadata.awsAuthorization.awsRoleArn)
} else if metadata.awsAuthorization.awsAccessKeyID != "" && metadata.awsAuthorization.awsSecretAccessKey != "" {
creds = credentials.NewStaticCredentials(metadata.awsAuthorization.awsAccessKeyID, metadata.awsAuthorization.awsSecretAccessKey, "")
}
return sess, &aws.Config{
Region: aws.String(metadata.awsRegion),
Endpoint: aws.String(metadata.awsEndpoint),
Credentials: creds,
}
}
I'd be willing to take a deeper look into this and send a PR if I get this to work 😄
Let's wait for removing the identityOwner
because maybe I'm wrong and it works somehow (I'm not an expert in AWS) because there is a user in slack who has it working: https://kubernetes.slack.com/archives/CKZJ36A5D/p1674015373425479
Okey, I have discovered how it works:
when identityOwner
is empty or identityOwner=pod
, it uses the ARN from the workload service account instead of environment variables from KEDA pod. The implementation is quite confusing and forces some things, but it works.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
I have been thinking about this, and I believe that creating a new authentication provider for AWS, with a good design, providing the same capabilities that azure has, is a good option. Once this new one is ready, I'd deprecate the other 2 (they are a bit caothic and complex to maintain) WDYT @kedacore/keda-contributors ?
I'm fine if we are sure that the potentially new provider works perfectly and has full parity
I'm fine if we are sure that the potentially new provider works perfectly and has full parity
The problem is that the current providers are really poor documented, and definitively they have some weird configurations (adding e2e test for them was horrible due to that). That's why I'd redesign the AWS identity in KEDA based on Azure identity (which is really nice IMO), documenting it properly
We are also facing some issues and the authentication using PodIdentity is quite confusing. We noticed that even after specifying podIdentity.provider as aws-eks the keda is trying to assume the role using the node role. Ideally it should use WebIdentity associated with the deployment pod.
Warning KEDAScalerFailed 24s (x6 over 2m54s) keda-operator (combined from similar events): AccessDenied: User: arn:aws:sts::1111122222:assumed-role/dev_node_group_role/i-0d8345ca0020 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::1111122222:role/sqs-event-processor
status code: 403, request id: 2b1a303f-95c4-4ad0-874c-16c705f3ae0c
We noticed that even after specifying podIdentity.provider as aws-eks the keda is trying to assume the role using the node role. Ideally it should use WebIdentity associated with the deployment pod.
Have you updated KEDA deployment with the needed information? https://github.com/kedacore/charts/blob/main/keda/values.yaml#L210 If you enable pod identity without assigning and identity to the deployment (you have to restart it too for mutating the new pod), it will try to use the node role. Could you check if KEDA pod has been correctly mutated with the required envs?
What KEDA version are you using?
Using podIdentity.provider
: aws-eks
means KEDA service account will have following type of annotation:
apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/audience: sts.amazonaws.com
eks.amazonaws.com/role-arn: arn:aws:iam::214272731565:role/example-async-dev-ew1-queuescaling-role
That means in a cluster, with using aws-eks
it will only work with only one IAM role and it will become problem for shared cluster where multiple tenants use different IAM role and currently it's not possible to attach multiple IAM role as annotation to KEDA service account.
Do we have any workaround for this?
Do we have any workaround for this?
Currently no (or at least now well documented), that's why this issue is opened :) In slack threads there is a workaround for that given by a user
I think that it's time to tackle this issue from our side because there haven't been willing contributors xD Let me make a quick proposal for the new podIdentity (hopping to be more clear and easy), the proposal is:
aws-eks
and aws-kiam
@tomkerkhove @zroubalik ?
Why do we need to deprecate aws-eks and introduce a new one? I'm not sure I get that.
aws-kiam
has to be deprecated/removed because the product itself it has been deprecated and abandoned in favor of aws-eks.
IMHO, aws-eks
should be deprecated too because there are some important problems:
But! After working on this a bit, I think that I have discovered how it works (I'm writing e2e tests for new podIdentity draft) and maybe we could just document properly how aws-eks
works, mark as deprecated and keep it there until next major release.
But in any case, I'd add the new podIdentity, unifying the behaviour with Azure podIdentities and moving the podIdentity logic from scalers to the TriggerAuthentication.
I agree with the approach proposed by @JorTurFer. If there's a problem with deprecation, we can keep the current one around as long as needed.
Great job!
aws-kiam
has to be deprecated/removed because the product itself it has been deprecated and abandoned in favor of aws-eks.
Definately fine for me!
IMHO,
aws-eks
should be deprecated too because there are some important problems:
- Missing documentation and e2e tests
- The feature is coupled between different "layers". You have to specify it in TriggerAuhtentication but also in scalers. This is too different from the approach we follow for example in Azure podIdentity. The consistency is always important.
- The code is (at least to me) confusing and complex to follow (and maintain).
- The name suggests that only works for EKS and it works on any cluster indeed
But these are not really reasons why we should deprecate it though? It's just in a bad state.
But! After working on this a bit, I think that I have discovered how it works (I'm writing e2e tests for new podIdentity draft) and maybe we could just document properly how
aws-eks
works, mark as deprecated and keep it there until next major release. But in any case, I'd add the new podIdentity, unifying the behaviour with Azure podIdentities and moving the podIdentity logic from scalers to the TriggerAuthentication.
What is this new pod identity then? I'm not sure I get the full proposal here :)
But these are not really reasons why we should deprecate it though? It's just in a bad state.
I don't agree with that. Based on our deprecation policies, we have a few options to remove things, and that generates technical debt. Current approach is totally coupled to the trigger. metadata, and we can't change it based on our deprecation policies. It's not just a single reason, is the whole list together, poor docs, e2e test gaps, too much complex code, alignment gap with AzurepodIdentity, confusing name (EKS is not IRSA, AWS IRSA means roles, AWS EKS means k8s, aws-eks suggest something for EKS and not for IRSA).
We can't address all of them, that's why IMHO redesigning it totally, testing and documenting it is better than adding pore patches and increasing technical debt.
We could introduce the new one, and deprecate the others. In this case, I agree with maintaining aws-eks there as deprecated without new features. This helps us because instead of adding it as a breaking change at some time (nvm if it's during major version or minor), we have warned users and they probable would have migrated to the new one.
I'm worried about keeping any kind of code just to not break something, because the technical debt grows and it can be a problem... Not documented + not tested code is potentially technical debt, even more in our case, where random contributors add the code and we don't see them anymore.
For example: In gcp podIdentity, the contributor was going to introduce the same coupled approach (not being needed because in gcp we don't allow using other identities and their code didn't do it), just because they saw it in already existing code.
Said this, If I'm the only one who thinks that this is required to improve in general (code, standardization, etc..) , we can just add some e2e tests to current approach and close this issue (deprecating&removing the deprecated aws-kiam, I think we've agreed on this).
I can be missing some important point or maybe my opinion is influenced by my knowledge gap in AWS
Said this, If I'm the only one who thinks that this is required to improve in general (code, standardization, etc..) , we can just add some e2e tests to current approach and close this issue (deprecating&removing the deprecated aws-kiam, I think we've agreed on this).
I agree with you, improvements is needed for sure.
What I'm not fully getting is that we want to deprecate 2 trigger auth types, but introduce a new one. So how will this new one be better? Did they announce a new authentication type, or is it simply because we want to re-work it so that it's better maintained and we just don't like aws-eks
as a name?
I'm not saying we can't deprecate them both, just trying to understand the reasoning for aws-eks and what this new thing is
Honestly, the only reason to deprecate the old one (let's talk about aws-eks only because the cause behind aws-kiam deprecation is just upstream deprecation in favor of irsa, nothing to do here) is because I see it more easy to introduce all the required changes with no impact for users xD
I mean, I'd agree with extending current aws-eks with the new changes in TriggerAuthentication if we would be going to deprecate and remove the trigger configuration part in a reasonable time (2-3 versions). If we just extend current aws-eks with the new options, we will create more confusing in users (more options for doing the same), and we will introduce more complexity to the code, because we will need to take into account the precedence to not break something.
In the other hand, creating new authentication, we have the freedom to design it as we want, not having to be fully compatible because it's another different authentication (based on the same? Yes, but with totally different "entity") . From coding pov, this will be so much better for adding, but also for removing the old code. From users pov, this is a migrations that thay have to do explicitly and that's why I also agree with maintaining aws-eks as deprecated and the new one until next major version, helping users with the migration path.
Summarizing, the only reason for adding a new auth and deprecating the previous one is for not having troubles with breaking changes and deprecation policies, as removing the field from scaler auth section and adding it as first class property of TA is a breaking change and personally I'm against supporting both places in the same auth to make our lifes happier xD
I think I'm not fully grasping it, still - SOrry :)
Does that mean we currently have it defined on the scalers instead of trigger authentication then? If not, what would the new trigger authentication look like compared to what we have today?
I think deprecating an existing trigger authentication just to change things might be a bit aggressive and cause end-user frustration we can avoid; but I am probably missing something. I want to avoid cognitive load on end-users
Just synced up with Zbynek and if we're doing this to move it from trigger metadata to auth then I'm fine with it
Report
It's mandatory to define
serviceAccountName
in the scaled workload because there is some code that read theawsRoleArn
even it's not always necessary for this authentication (the SDK reads the AWS role from the KEDA service account).My proposal is to deprecate
identityOwner
in favor of a new option in TriggerAuthentication (as we have in with azure pod identity).aws-kiam
should be removed as it has been deprecated in favor of aws-eks. We should deprecate it and remove in KEDA v2.12.