logstash-plugins / logstash-output-rabbitmq

Apache License 2.0
17 stars 37 forks source link

Pipeline doesn't start if logstash can't connect to rabbitmq #64

Closed Woudan closed 7 years ago

Woudan commented 7 years ago

Pipeline does not start and keeps flushing when it can't connect to a rabbitmq node that is configured in the pipeline. It should just create the pipeline and keep trying to connect every retry interval.

In my case we have multiple (+10) remote rabbitmq instances that could be down when starting logstash. See the debug output file for more info.

logstash debug output.txt

robbavey commented 7 years ago

Hi @Woudan Thanks for submitting this - I was wondering if we could get more information about your setup.

Woudan commented 7 years ago

Hi @robbavey,

The main issue is that rabbitmq output does try to reconnect but it fully blocks my pipeline. All the other rabbitmq input connections won't go up either. When I remove the rabbitmq server that is down the full pipeline works just fine.

Sounds like an issue that can't be solved because, like you say, the pipeline will be blocking by messages that want to be routed to the output rabbitmq that's down will block the pipeline.

Do you have a suggestion of how to push messages to a remote rabbitmq that can be down?

Thanks for the information!

andrewvc commented 7 years ago

@Woudan the best practice here is to buffer those messages somewhere and use a second pipeline to send them to rabbitmq from that buffer. Today that means running two LS instances, but we're working on a multi-pipeline feature that will make that much easier in 6.0 (it's already in the alphas).

The easiest way to do that today would be to replace the rabbitmq output in your current config, and use the lumberjack output instead, which can talk to the beats input running the other pipeline (confusing, I know). The second pipeline can use the persistent queue to buffer events while the rabbitmq instance is down.

This means the primary logstash instance will be pushing events through so long as the second one doesn't block, which would only happen if the PQ was full.

Does that make sense?

Woudan commented 7 years ago

@andrewvc, thanks for the information and explanation. That makes sense.

I'm going to try your setup and check if it makes more sense to build something myself, use you're solution or going to do some beta 6.0 testing.