observability-lab-cse / observability-lab

2 stars 4 forks source link

[bug] Different errors reported related to Event Hub #56

Open MagdaPaj opened 11 months ago

MagdaPaj commented 11 months ago

Message checkpoint in the Devices Manager behaves weirdly. As part of this PR #55 I'm adding the Update Checkpoint after each message. This is not recommended though, and we should find a better solution. And when looking at the logs of the Devices Manager, it looks correct. It's picking the messages, and then after 5 minutes it's stopped (as per current implementation), and then after its restart, it's starting correctly where it left.

Image

So, this is correct behavior (without checkpointing, it was always starting from the first message).

But then in the Event Hub metrics it looks weird, as there is significantly more Outgoing Messages, than Incoming. Image

It seems to be related with Device Manager restarts, then I can see spikes of Outgoing Messages.

It would be good to figure out what is the problem. But also I would suggest to not stop the device manager every 5 min, but just keep it running and stop when stop signal is sent.

MagdaPaj commented 11 months ago

I see also this dependency error:

Image

So somehow the Event Hub Client searches for the checkpoint Blob with this url: https://stmabaran3.blob.core.windows.net/event-hub-data/evhns-mabaran3.servicebus.windows.net/evh-mabaran3/devicemanager/checkpoint/-1

while my checkpoint blob is stored:

Image

MagdaPaj commented 11 months ago

Sample examples how to create Event Processor: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/eventhub/Azure.Messaging.EventHubs.Processor/samples/Sample01_HelloWorld.md

MagdaPaj commented 8 months ago

Application was rewritten and it uses now Generic Host, so it doesn't restart so often. And as a result, there is less errors, but still dependency errors happen from time to time, and after restarts there is a spike of outgoing messages visible in the Event hub metrics. Details written in that merged PR's description: https://github.com/observability-lab-cse/observability-lab/pull/69