kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.59k stars 1.08k forks source link

KEDA Event Hub Scaler does not Initialize Blobs #4532

Closed tkennes closed 1 year ago

tkennes commented 1 year ago

Report

Azure Eventhub Scaler does not seem to initialize blobs, and is stuck in a loop trying to fetch the current status of the checkpoints, until it fails without actually having read it.

Expected Behavior

Blobs are created in the Azure storage acccount and used to keep check of checkpoints.

Actual Behavior

No use of checkpoints, and in turn on autoscaling of the underlying ScaledObject based on the eventhub events.

Steps to Reproduce the Problem

  1. Set up a Storage account in Azure, and an Event Hubs
  2. Set up a SPN that has access to the Storage Account (Storage Blob Data Contributor), and the Event Hubs
  3. Deploy KEDA with workload-identity, ID = application_id of the SPN
  4. Observe the error logs in the KEDA operator.

Logs from KEDA operator

2023-05-09T14:33:34Z    ERROR    azure_eventhub_scaler    Blob container : <blob> not found to use checkpoint strategy, getting unprocessed event count without checkpoint    {"type": "ScaledObject", "namespace": "<ns>", "name": "<keda>",  "error": "-> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /workspace/vendor/github.com/Azure/azure-storage-blob-go/azblob/zc_storage_ error.go:42
===== RESPONSE ERROR (ServiceCode=BlobNotFound) =====
Description=The specified blob does not exist.
RequestId:115b812c-001e-0024-3f83-8291f8000000
Time:2023-05-09T14:33:34.3853190Z, Details: 
Code: BlobNotFound
GET https://<PRIVATE>.blob.core.windows.net/azure-webjobs-eventhub/4?timeout=61
 -------------------------------------------------------------------------------- 
RESPONSE Status:  404 The specified blob does not exist.
Content-Length: [215]
Content-Type: [application/xml]\n   Date: [Tue, 09 May 2023 14:33:33 GMT]
Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]\n   X-Ms-Client-Request-Id: [2fbbd283-8aff-468c-5c71-4ce0ac403f99]\n   X-Ms-Error-Code: [BlobNotFound]
X-Ms-Request-Id: [115b812c-001e-0024-3f83-8291f8000000]\n   X-Ms-Version: [2020-10-02]\n\n\n"}  

KEDA Version

2.10.1

Kubernetes Version

1.26

Platform

Microsoft Azure

Scaler Details

Azure Event Hubs

Anything else?

No response

JorTurFer commented 1 year ago

Hi, KEDA doesb't this by design. As the blob is created by the checkpointer of your consumer, it should be there. KEDA doesn't create any kind of infrastructure. Is your application creating the blob?

tomkerkhove commented 1 year ago

This one is a bit of a blur line - The app owns it because it needs to checkpoint, but without it KEDA cannot identify if it should scale.

In this case, KEDA should be improved to log a warning and update the CRD status/events to note that the blob is not there so unable to scale; instead of failing IMO.

That way, we properly handle it without managing app-owned infrastructure

JorTurFer commented 1 year ago

We already raise the message and also update the CRD, but that's an error indeed. The user given blob doesn't exist.

The original request from the OP is that KEDA should initialize the blob, and that means managing infrastructure

Expected Behavior

Blobs are created in the Azure storage acccount and used to keep check of checkpoints.

tomkerkhove commented 1 year ago

Yeah I read that but wanted to find a middle ground so that KEDA does not blow up, but looks like we already provide a solid approach so I think we can close this issue as "out of scope of KEDA".

Sorry @tkennes

tkennes commented 1 year ago

Thanks for these comments!

It does not solve my problem, but it makes sense. If I find a solution, I'll make sure to post it here. Either through the func runtime, or through actual checkpointing in the code like for example here: https://learn.microsoft.com/en-us/python/api/overview/azure/eventhub-checkpointstoreblob-aio-readme?view=azure-python

azizabah commented 1 year ago

So we have observed some nuanced behavior in that if you have a blob storage where some of the partition checkpoints have been initialized but not all then KEDA will fall back to not using that checkpoint because of this line https://github.com/kedacore/keda/blob/e8fcb84d8f1c4eecb52d0c2e7305450e7a699319/pkg/scalers/azure_eventhub_scaler.go#L283C43-L283C43. It would be preferable for keda to only fallback if none of the partitions are initialized instead of failing on a single one.

I can open a new issue if that's preferable but it felt related.