Open anshuca0743 opened 3 years ago
The plugin does not prevent multiple readers from reading the same data, there is no synchronization between two instance and no locking. So two instances would download the same dataset unless you restrict them to each read different directories or files.
For this to be implemented, you would need pipeline to pipeline communication and the problem is that you can't detect reliably that there are two instances, they may not be able to reach eachother and writing a sync file in the storage account is also not foolproof. https://www.elastic.co/guide/en/logstash/current/pipeline-to-pipeline.html
If I would have an infinite amount of time I would implement an optional configuration that defines a logstash reader cluster where one is configured as master that updates the registry and the slaves check if the master is still updating the registry and between them they share the workqueue where each logstash instance downloads (a part of) a file.
But this would require restructuring of the code base and the benefit doesn't seem to be worth it.
I have a requirement to have multiple logstash instances reading from the same Azure storage account and the same container. The container has activity logs. I am running two logstash instances, and when I check the output of both instances, I find the same activity logs in both instances. I don’t want duplicate logs. Does this plugin avoid duplicate processing? or Do we need any specific config to be set to achieve the same?