Open saliceti opened 7 years ago
I'm not well experienced with RELP, but after reviewing your report and reading the RELP spec, I think I see the problem --
It appears that your rsyslog server is sending two open
commands for some reason. Can you attach a pcap for both network paths (rsyslog -> logstash, logstash -> rsyslog). The snapshot from tcpdump
you sent appears to be only from rsyslog (the client sending logs).
It's unclear to me whether this is an rsyslog or logstash problem, but I believe it is a problem that rsyslog sends open
twice, though I'm not sure why it's doing that.
I experience the same problem when sending RELP from rsyslog to logstash-->elastic. If I shut down elastic the pushback for tcp should kickin and start queue up on either logstash|rsyslog-server|client. So, if you don't have proper queue (mainqueues and actionqueues) in rsyslog you will end up in a really slow syslog (by design syslog slows down when not able to process messages). I thought that was my case, but having my queues reviewed by Adiscon GmbH (rsyslog company) and the problem continues.
I'll try to reproduce the issue in a simpler setup than Cloud Foundry as it adds a lot of noise from all the VMs.
I believe this is the same as my findings on trying to loadbalance logstash relp input behind HAProxy. Somehow the input module fails to renegotiate the connection - and we end up with no logflow.
We use rsyslog to send Cloud foundry messages to logstash via RELP. This is done via the syslog and logsearch releases. The whole is deployed on AWS and our logstash servers are behind an ELB. We've noticed a problem when the logstash servers disconnect, for example by temporarily removing the ELB backends. When rsyslog reconnects, it tries to open a new RELP session and keeps getting rejected by logstash. This is what we keep seeing in the logstash log:
It looks like it comes from this line.
From tcpdump:
rsyslog tries to open many connections and consumes a lot of CPU (up to 80%). Restarting rsyslog solves the issue. When we use TCP instead of RELP transport, there is no issue.
We're on logstash 2.3.3 and logstash-input-relp 2.0.5. Any idea?