kedacore / external-scaler-azure-cosmos-db

KEDA External Scaler for Azure Cosmos DB
Apache License 2.0
10 stars 8 forks source link

Estimator shows estimated lag but processor is not processing #69

Open karpikpl opened 5 months ago

karpikpl commented 5 months ago

It seems that scaling metric is not precise enough.

Scaler is reporting items to process:

Lease [0] owned by host Instance-cosmosdb-order-processor-585d7b9bc5-gnb88 reports 37 as estimated lag.
Lease [1] owned by host Instance-cosmosdb-order-processor-585d7b9bc5-5jdb2 reports 38 as estimated lag.
There are 2 partitions with estimated lag.

But processor is not getting any items:

k logs cosmosdb-order-processor-585d7b9bc5-gnb88 -n cosmosdb-order-processor
2024-06-13 01:39:00 info: Keda.CosmosDb.Scaler.Demo.OrderProcessor.Worker[0]
      Started change feed processor instance Instance-cosmosdb-order-processor-585d7b9bc5-gnb88
2024-06-13 01:39:00 info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
2024-06-13 01:39:00 info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
2024-06-13 01:39:00 info: Microsoft.Hosting.Lifetime[0]
      Content root path: /app

I verified both are connected to the same DB using a test app that used same client for estimator and feed processor.

Expected Behavior

Pods are scaled to 0 when there's nothing to process. Pods are processing items when scaler reports estimated changes.

Actual Behavior

Scaler reports changes but processor is not doing anything.

Steps to Reproduce the Problem

  1. Run Demo App on a multi-partition container
  2. Generate some data (may need to do it few times)
  3. Observe the pods

Specifications

JatinSanghvi commented 3 months ago

This may have to do more with the order processor not finding any items to process than the scaler reporting incorrect estimated lag. Is it possible that the old processor pod completed processing all changes in its change-feed, then shifted to second change-feed and processed items there too, before KEDA could increase the replica count for order processor? Is the estimate lag not going down to zero, even after giving some time, say 5 minutes or so?

karpikpl commented 3 months ago

It's been a while, but I remember it did not go to zero even after few hours and restarting processors