logstash-plugins / logstash-input-azure_event_hubs

Logstash input for consuming events from Azure Event Hubs
Apache License 2.0
18 stars 28 forks source link

Logstash face LeaseLost. #80

Open mashhurs opened 1 year ago

mashhurs commented 1 year ago

Issue description

You may stop reading rest:

Logstash may face Azure Evens Hub Lease Lost, Http 409 error code when Logstash is experiencing backpressure. Lease Lost exceptions can be observer in the debug logs only. Microsoft API intentionally enabled this noise to give better visibility to the clients using the SDK.

Sample logs:

[2022-10-11T14:42:40,645][DEBUG][com.microsoft.azure.eventprocessorhost.PartitionContext][main][] host logstash-774c9151-ade5-4288-aee4-a7029d3c4471: 1: Saving checkpoint: 220024//3009
[2022-10-11T14:42:40,645][DEBUG][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][main][] host logstash-774c9151-ade5-4288-aee4-a7029d3c4471: 1: Checkpointing at 220024 // 3009
[2022-10-11T14:42:40,645][DEBUG][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][main][] host logstash-774c9151-ade5-4288-aee4-a7029d3c4471: 1: Updating lease
[2022-10-11T14:42:40,645][DEBUG][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][main][] host logstash-774c9151-ade5-4288-aee4-a7029d3c4471: 1: Renewing lease
[2022-10-11T14:42:40,699][DEBUG][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][main][] host logstash-774c9151-ade5-4288-aee4-a7029d3c4471: 0: WAS LEASE LOST? Http 409
[2022-10-11T14:42:40,699][DEBUG][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][main][]] host logstash-774c9151-ade5-4288-aee4-a7029d3c4471: 0: Http LeaseIdMismatchWithLeaseOperation :: The lease ID specified did not match the lease ID for the blob.

From Logstash point of view, what may cause for the Lease Lost? When Logstash instance persists the offset into the Blob storage, it acquires the Lease for the blob and then update after processing the batch of events. The batch process may take longer than Lease timeout or lose the handshake when resource usage is high (see the discussion here). When batch process takes longer time is when Logstash face backpressure, waiting for longer time to push into the queue. See the possible approaches section for what you can do.

For the possible cases for the Lease Lost from Azure point of view, see the reference

Possible approaches

First thing first, make sure you have enough Logstash instances to consume the events Azure Events Hub producing. It depends on the situation what solution would be the best practise for most cases, considering following will be good start: