logstash-plugins / logstash-input-azure_event_hubs

Logstash input for consuming events from Azure Event Hubs
Apache License 2.0
18 stars 28 forks source link

Support checkpointing on interval only, not batch completion #30

Open mbrancato opened 5 years ago

mbrancato commented 5 years ago

Please add support for the checkpoint interval to be the only influence on writing a checkpoint to blob storage. Currently, checkpointing occurs when any batch completes as well. The problem is that some outputs are constrained by the batch size, and that might lead to smaller batches, causing significant write operations. I've seen this be very expensive even for smaller environments. Even then, most batches at the default size of 50 are going to cause large reads/writes for checkpointing.

That said, our use case would be fine with a purely time-based checkpoint, not batch based. By reducing checkpoint intervals to 30 seconds or so, a significant cost savings could be realized.

choovick commented 5 years ago

Also interested in any feature that can improve costs of storage accounts. I'm surprised there is no way to use local filesystem or centralized cache/document storage like redis or mongo to save checkpoints...

choovick commented 5 years ago

@mbrancato Thinking about it again. Can't we achieve this by setting very large max_batch_size and set checkpoint_interval to desired delay? We will have to watch memory usage thou...

mbrancato commented 5 years ago

I had other limitations on batch sizes. But batch sizes don’t let Logstash wait for the batch queue to fill up. If you are using Azure storage, be sure to use V1 storage accounts since the transaction costs are 90% less than V2.

choovick commented 5 years ago

@mbrancato IC, Thanks! it does not look that MS planing to End Of Life V1, definitely gonna try it out.

SpencerLN commented 4 years ago

+1 for any feature that can reduce the storage costs associated with check-pointing.

shauryagarg2006 commented 4 years ago

Also I think the checkpointing interval at the moment is not serving any purpose. Every event is checkpointed.

I made the following change in my fork to fix it.

https://github.com/shauryagarg2006/logstash-input-azure_event_hubs/commit/60292a17bad6deaee61048cd4d18d3c3a0ad6b66

ghost commented 3 years ago

Hello, I had the same issue with the cost associated with the Azure Storage. I decided to switch to the Kafka input as Azure Event Hub supports it. The cost are quite low now ! The Microsoft documentation states that Azure Event Hub uses an Azure Storage internally when using the Kafka interface, I hope they won't change their mind about providing this storage without additional costs. Some resources you might find useful :

lucianaparaschivei commented 2 years ago

hello, we are facing the same issue with high costs on the azure storage. The plugin is making way too many storage transactions. Looks like 1 checkpoint per 3-4 messages which is a lot. Can you please address this?