microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
358 stars 27 forks source link

[KEDA][AzureEventHub] App not scaling to zero #1225

Closed goncalo-oliveira closed 2 weeks ago

goncalo-oliveira commented 3 weeks ago

This issue is a:

Issue description

I have set up a container app with 0 - 4 replicas, using azure-eventhub scale rule with the following settings

- type:
  metadata:
    activationUnprocessedEventThreshold: 10
    checkpointStrategy: blobMetadata
    blobContainer: ...
    connectionFromEnv: ...
    consumerGroup: ...
    eventHubNameFromEnv: ...
    storageConnectionFromEnv: ...
    unprocessedEventThreshold: 64

I did have an earlier issue, which I commented on #972, where the app didn't scale down, but that was down to the missing checkpointStrategy which was updated. It now scales up and down to 1, depending on the load. However, it does not scale down to zero when there are no events to process.

I was thinking that it might be intended behaviour, but from the documentation, it seems that it should scale to zero and activate at a threshold

activationUnprocessedEventThreshold - Target value for activating the scaler. Learn more about activation here.

Steps to reproduce

  1. Set up container app with minReplicas: 0 and a positive maxReplicas value
  2. Set up scale rules as described above
  3. Maintain a period without event intake

Expected behavior

Without any events being sent to the Event Hub, the expectation was for the app to scale to zero. When the unprocessed messages hits the activationUnprocessedEventThreshold, the app scales back to 1.

Actual behavior

Without any events being sent to the Event Hub, the number of replicas is kept at 1.

Screenshots

image image
simonjj commented 3 weeks ago

@goncalo-oliveira thank you for reaching out. Does this application have external ingress enabled and is also serving HTTP requests?

goncalo-oliveira commented 3 weeks ago

Hi @simonjj, thanks for replying. Yes, as a matter of fact, the app does have ingress enabled so that prometheus metrics can be scraped off by another app, although the ingress traffic is limited to the container app environment. Right... as I'm writing this... the scraping is actually keeping the app alive...

goncalo-oliveira commented 2 weeks ago

Just to confirm, I've removed the ingress from the app and it did scale down to zero after some time, as expected. However, the app did not scale back up when new events were added, which was unexpected. Reactivating the ingress woke the app again and scaled up to 2 replicas, possible because of the number of events waiting and after some time, it scaled down to 1, after the number of events normalized (all as expected).

Am I missing something here? I do have activationUnprocessedEventThreshold: 10 set in the configuration, so the expected outcome was for the app to activate itself when events were waiting.

It's not the end of the world, since for my particular use case, in production, I am not expecting the number of events to be zero, ever, but it's still an unexpected behaviour.

anthonychu commented 2 weeks ago

@goncalo-oliveira Can you please check your system logs to see if there are any scaling/KEDA related error messages? It's possible that the Event Hubs scale rule isn't correctly configured and the only thing scaling the app is the HTTP rule.

goncalo-oliveira commented 2 weeks ago

I guess that I'm proving myself wrong and that things are working as expected - which is a good thing. Can't say why it didn't work first, maybe I did have something not properly configured.

{"Msg":"Deactivated apps/v1.Deployment k8se-apps/app-***** from 1 to 0","Reason":"KEDAScaleTargetDeactivated","EventSource":"KEDA","Count":273}
{"Msg":"Container \u0027consumer\u0027 was terminated with exit code \u0027\u0027 and reason \u0027ManuallyStopped\u0027","Reason":"ContainerTerminated","EventSource":"ContainerAppController","Count":1}
{"Msg":"Scaled apps/v1.Deployment k8se-apps/app-***** from 0 to 1, triggered by throughput","Reason":"KEDAScaleTargetActivated","EventSource":"KEDA","Count":1}
{"Msg":"Replica \u0027app-*****-76c4877f9f-cf7sc\u0027 has been scheduled to run on a node.","Reason":"AssigningReplica","EventSource":"ContainerAppController","Count":0}
{"Msg":"Pulling image \u0027********\u0027","Reason":"PullingImage","EventSource":"ContainerAppController","Count":1}
{"Msg":"Successfully pulled image \u0027********\\u0027 in 1.9588958s","Reason":"PulledImage","EventSource":"ContainerAppController","Count":1}
{"Msg":"Created container \u0027consumer\u0027","Reason":"ContainerCreated","EventSource":"ContainerAppController","Count":1}
{"Msg":"Started container \u0027consumer\u0027","Reason":"ContainerStarted","EventSource":"ContainerAppController","Count":1}

Thank you both, @simonjj and @anthonychu.

simonjj commented 2 weeks ago

Thanks for clarifying and getting back to us. We will add an item to improve the KEDA error messaging.