kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.29k stars 1.05k forks source link

TriggerAuthentication - spec.podIdentity.identityId as a pointer - breaking change in 2.12.0 #5109

Open radekfojtik opened 11 months ago

radekfojtik commented 11 months ago

Report

After upgrading to version 2.12.0, you need to recreate the TriggerAuthentication resource (if you are using PodIdentityProviderAzure \ PodIdentityProviderAzureWorkload with an unspecified identityId). Because the identityId field is a pointer. Otherwise you will get this error "IdentityID of PodIdentity should not be empty".

Before version 2.12.0 - if you omit the identityId field, your resource will look like this

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  annotations:
    meta.helm.sh/release-name: my-release-name
    meta.helm.sh/release-namespace: my-namespace
  creationTimestamp: xxx
  finalizers:
  - finalizer.keda.sh
  labels:
    app.kubernetes.io/managed-by: Helm
  name: my-name
  namespace: my-namespace
spec:
  podIdentity:
    identityId: ""
    provider: azure-workload

The identityId field is an empty string, although it has been omitted.

The solution is to recreate TriggerAuthentication. This is a breaking change that should at least be pointed out, or should be in version 3.*?

Expected Behavior

Only minor changes

Actual Behavior

Breaking change

Steps to Reproduce the Problem

  1. Use TriggerAuthentication - provider PodIdentityProviderAzure/PodIdentityProviderAzureWorkload and omit the identityId (optional)
  2. Update KEDA to version 2.12.0
  3. You will get this error "IdentityID of PodIdentity should not be empty"

Logs from KEDA operator

No response

KEDA Version

2.12.0

Kubernetes Version

1.26

Platform

Microsoft Azure

Scaler Details

No response

Anything else?

No response

JorTurFer commented 11 months ago

@tomkerkhove @SpiritZhou

tomkerkhove commented 10 months ago

Can you elaborate on how the trigger authentication resource was created please? Are you saying you are specifying "" or you are not and it was added by default? If the latter, in I presume you mean in v2.11?

radekfojtik commented 10 months ago

Sure.

In v2.11

image

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  annotations:
    meta.helm.sh/release-name: addresslookup-uat-jobrunner
    meta.helm.sh/release-namespace: acquisition-uat
  creationTimestamp: "2023-10-20T18:49:01Z"
  finalizers:
  - finalizer.keda.sh
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
  name: keda-operator-pod-identity-addresslookup-uat-jobrunner
  namespace: acquisition-uat
  resourceVersion: xxx
  uid: xxx
spec:
  podIdentity:
    identityId: ""
    provider: azure-workload

This means that after upgrading KEDA to v2.12 it is necessary to re-create TriggerAuthentication. Otherwise you will get this error "IdentityID of PodIdentity should not be empty".

tomkerkhove commented 10 months ago

Hm I was not aware of that and it is unfortunate.

Here is what I think we should do:

  1. Revert the change we made
  2. Change existing code to no longer use "" but use nil by default
  3. Ship hotfix in to 2.12.1
  4. Re-apply original change

That way, end-users will no longer face this issue when we ship 2.13.0 and this validation can still be used. If end-users face this issue with 2.13.0 they need to migrate to 2.12.1 first before going to 2.13.0.

THoughts @zroubalik @JorTurFer @SpiritZhou?

radekfojtik commented 10 months ago

@tomkerkhove I am not sure if this helps, but I could be wrong. Once the identityId is set to "" (v2.11), it will never be nil again until the TriggerAuthentication object is recreated.

SpiritZhou commented 10 months ago

@tomkerkhove @JorTurFer In 2.11, the triggerauth object will be updated with an "" value when setting Finalizer in EnsureAuthenticationResourceFinalizer. I think the better way is to check if identityId is "" and then update it with nil value on reconcile. We don't need to revert the change, as the webhook will validate new requests and the nil value update will only affect the triggerauth objects during the KEDA upgrade.

tomkerkhove commented 10 months ago

What do you think @JorTurFer?

radekfojtik commented 10 months ago

I thought of a slightly different alternative (there would be no need for the operator to update the value to nil)

  1. Keep the validation only in webhook (remove from operator)
  2. Webhook - edit ValidateUpdate func - possibly validateSpec

    image

This should ensure that TAs created before version 2.12 will not be validated, but newer ones will.

radekfojtik commented 10 months ago

guys, here is fix for a related problem with identityId validation https://github.com/kedacore/charts/pull/553

SpiritZhou commented 10 months ago

I thought of a slightly different alternative (there would be no need for the operator to update the value to nil)

  1. Keep the validation only in webhook (remove from operator)
  2. Webhook - edit ValidateUpdate func - possibly validateSpec

image

This should ensure that TAs created before version 2.12 will not be validated, but newer ones will.

I think this is also a good solution.

zroubalik commented 10 months ago

guys, here is fix for a related problem with identityId validation kedacore/charts#553

nice, could you please give an explanation for this change?

radekfojtik commented 10 months ago

guys, here is fix for a related problem with identityId validation kedacore/charts#553

nice, could you please give an explanation for this change?

If KEDA is deployed using helm, the configuration for the validation webhook is not created (TriggerAuthentication/ClusterTriggerAuthentication). So the identityId validation on the webhook side will not work.

This fix adds the configuration that is needed for TriggerAuthentication/ClusterTriggerAuthentication validation.

zroubalik commented 10 months ago

@radekfojtik cool, does it fully fix this issue then?

radekfojtik commented 10 months ago

@radekfojtik cool, does it fully fix this issue then?

Unfortunately no, it only solves the related problem that the webhook configuration (for validation of TriggerAuthentication/ClusterTriggerAuthentication) was not added to helmchart.

This issue is about breaking change in v2.12 releated to this identityId validation. In short - if user has a TA defined where the provider is azure | azure-workload and the identityId is not specified (omitted) then scaling stops working (after the upgrade to v2.12) - keda operator error "IdentityID of PodIdentity should not be empty"

Two possible solutions are currently proposed

JorTurFer commented 10 months ago

I think that we should remove validations from operator. IMO, if we are validating the item on create/update, we don't need to validate it again as on operator. We have introduced admission webhooks, lets use them.

I get the point that forcing users to use the admission is "ugly", but that's the goal of the admission webhooks, detecting issues during the admission. If users don't care about validations, they can remove the webhooks, but it also means that they do need to validate the things from their side, it's a trade off of their decision. No webhooks? no validations

radekfojtik commented 10 months ago

@JorTurFer please take a look https://github.com/kedacore/keda/commit/63dca4969608cf9388912eaea050d93337996d8c (based on the branch release/v2.12)

JorTurFer commented 10 months ago

@JorTurFer please take a look https://github.com/kedacore/keda/commit/63dca4969608cf9388912eaea050d93337996d8c (based on the branch release/v2.12)

It looks good to me

JorTurFer commented 10 months ago

But, please open the PR to main branch, to the the release/v2.12 branch, we will cherry pick it

radekfojtik commented 10 months ago

But, please open the PR to main branch, to the the release/v2.12 branch, we will cherry pick it

@JorTurFer

here is the PR https://github.com/kedacore/keda/pull/5142

please take a look also https://github.com/kedacore/charts/pull/553

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 8 months ago

This issue has been automatically closed due to inactivity.

zroubalik commented 8 months ago

@JorTurFer what shall we do about this?

tusharsappal commented 7 months ago

Any updates here when this will be tackled ?

stale[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

amirschw commented 5 months ago

Not stale

aamarnath12 commented 1 month ago

We are still facing this issue in keda 2.14.3 and AKS 1.28

radekfojtik commented 1 month ago

Guys @JorTurFer @zroubalik @tomkerkhove , why is this not merged yet? The fix has been prepared for a long time