logstash-plugins / logstash-output-rabbitmq

Apache License 2.0
17 stars 37 forks source link

Duplicating connections when rabbitmq stop #9

Closed iemem15 closed 9 years ago

iemem15 commented 9 years ago

We have several issues when rabbitmq crashes or stops or there's a network disconnection. the plugin tries to reconnect every 10's but every time it tries to reconnects it duplicates the ESTABLISHED connections per output, so if there is 1 output in 1 hour are like 360 ESTABLISHED connections, but also never sends again messages to the exchanges We've found the "automatic_recovery" parameter, it's value by default is false, We've changed it to true, and the messages are send correctly, but it creates one connection every time there's a disconnect (network problem or crashing rabbitmq) , for know it's fine adjusting this parameter but with hundreds of connections would be a problem. we have a very huge environment, Here is the path we change the parameter vendor/bundle/jruby/1.9/gems/logstash-output-rabbitmq-0.1.5-java/lib/logstash/outputs/rabbitmq/march_hare.rb

The plugin version is 0.1.5

Any help would be appreciate it.

Thanks in advance.

andrewvc commented 9 years ago

@iemem15 is this still a problem with the newest version of this plugin? We've completely refactored it and we now use a new version of march_hare that uses the Java client's recovery directly.

rudijs commented 9 years ago

@andrewvc Yes I'm seeing this using Logstash 1.5.2

cat /opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/version.rb 
# encoding: utf-8

module MarchHare
  VERSION = "2.5.1"
end

What led me to this issue is when RabbitMQ restarts, logstash was not reconnecting successfully.

There is a non stop stream stream of new connections made, I've restarted logstash after about 20 existing new connections but I assume they'll continue to grow as others have reported.

With the --debug flag here some error output:

RabbitMQ connection error: MarchHare::ChannelAlreadyClosed. Will attempt to reconnect in 10 seconds... {:exception=>#<MarchHare::ChannelAlreadyClosed: MarchHare::ChannelAlreadyClosed>, :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/exceptions.rb:121:in `convert_and_reraise'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/channel.rb:965:in `converting_rjc_exceptions_to_ruby'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/channel.rb:963:in `converting_rjc_exceptions_to_ruby'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/channel.rb:595:in `basic_publish'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/exchange.rb:77:in `publish'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-rabbitmq-1.0.0-java/lib/logstash/outputs/rabbitmq/march_hare.rb:40:in `publish_serialized'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json-1.0.0/lib/logstash/codecs/json.rb:49:in `encode'", "/opt/logstash/vendor/bun
dle/jruby/1.9/gems/logstash-output-rabbitmq-1.0.0-java/lib/logstash/outputs/rabbitmq/march_hare.rb:31:in `receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/outputs/base.rb:88:in `handle'", "(eval):172:in `output_func'", "/opt/logstash/vendor/bun
dle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:243:in `outputworker'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:165:in `start_outputs'"], :level=>:error, :file=>"logstash/outputs/rabbitmq/march_hare.rb", :line=>
"48", :method=>"publish_serialized"}

What's also bad is when using the file input type, the since_db file location is updated - but the message is not sent due to the bad connection and the message is lost.

When restarting logstash it just starts the file input from the end of file, any messages that failed to send when logstash was unable to connect to rabbitmq are not resent (thus lost).

Well the are not lost per say, they are still in the log file itself but not into the logstash piple line (into to Elasticsearch).

I'm using Docker, doing something like this with a file input on /var/log/tester.log

echo 36 >> /var/log/tester.log 
sudo docker stop rsm-rabbitmq

echo 37 >> /var/log/tester.log 
echo 38 >> /var/log/tester.log 

sudo docker start rsm-rabbitmq
echo 39 >> /var/log/tester.log

sudo docker stop rsm-logstash
sudo docker start rsm-logstash

echo 40 >> /var/log/tester.log 

Log entries 37 and 38 will not end up in Elasticsearch as RabbitMQ is down.

Log entries 39 not end up in Elasticsearch as Logstash is unable to reconnect to the running RabbitMQ.

Then after restarting logstash log entry 40 will make it to Elasticsearch as everything is up and connected.

Log entries 37, 38 and 39 will not be resent as the since_db file location for /var/log/tester.log has incremented (even though the messages did not succeed).

Here's a sample of a logstash output using --debug where we can see the since_db metadata updating from the file input (this is while the connection to RabbitMQ is down)

/host/var/log/tester.log: file grew, old size 144, new size 147 {:level=>:debug, :file=>"filewatch/watch.rb",  line=>"96", :method=>"each"} Received line {:path=>"/host/var/log/tester.log", :text=>"46", level=>:debug, :file=>"logstash/inputs/file.rb", :line=>"137", :method=>"run"}

I guess it gets quite complicated - the file watcher should not increment the since_db metadata on file location unless there's an ack (success) from RabbitMQ.

andrewvc commented 9 years ago

Ouch. This is not good, thanks for reporting the issue. I'm going to make this patch my top priority monday

On Friday, August 14, 2015, Rudi Starcevic notifications@github.com wrote:

@andrewvc https://github.com/andrewvc Yes I'm seeing this using Logstash 1.5.2

cat /opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/version.rb

encoding: utf-8

module MarchHare VERSION = "2.5.1" end

What led me to this issue is when RabbitMQ restarts, logstash was not reconnecting successfully.

There is a non stream stream of new connections made, I've restarted logstash after about 20 existing new connections but I assume they'll continue to grow as others have reported.

With the --debug flag here some error output:

RabbitMQ connection error: MarchHare::ChannelAlreadyClosed. Will attempt to reconnect in 10 seconds... {:exception=>#, :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/exceptions.rb:121:in convert_and_reraise'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/channel.rb:965:inconverting_rjc_exceptions_to_ruby'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/channel.rb:963:in converting_rjc_exceptions_to_ruby'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/channel.rb:595:inbasic_publish'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/march_hare-2.5.1-java/lib/march_hare/exchange.rb:77:in publish'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-rabbitmq-1.0.0-java/lib/logstash/outputs/rabbitmq/march_hare.rb:40:inpublish_se rialized '", "org/jruby/RubyProc.java:271:in call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json-1.0.0/lib/logstash/codecs/json.rb:49:inencode'", "/opt/logstash/vendor/bun dle/jruby/1.9/gems/logstash-output-rabbitmq-1.0.0-java/lib/logstash/outputs/rabbitmq/march_hare.rb:31:in receive'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/outputs/base.rb:88:inhandle'", "(eval):172:in output_func'", "/opt/logstash/vendor/bun dle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:243:inoutputworker'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:165:in `start_outputs'"], :level=>:error, :file=>"logstash/outputs/rabbitmq/march_hare.rb", :line=> "48", :method=>"publish_serialized"}

What's also bad is when using the file input type, the since_db file location is updated - but the message is not sent due to the bad connection and the message is lost.

When restarting logstash it just starts the file input from the end of file, any messages that failed to send when logstash was unable to connect to rabbitmq are not resent (thus lost).

Well the are not lost per say, they are still in the log file itself but not into the logstash piple line (into to Elasticsearch).

I'm using Docker, doing something like this with a file input on /var/log/tester.log

echo 36 >> /var/log/tester.log sudo docker stop rsm-rabbitmq

echo 37 >> /var/log/tester.log echo 38 >> /var/log/tester.log

sudo docker start rsm-rabbitmq echo 39 >> /var/log/tester.log

sudo docker stop rsm-logstash sudo docker start rsm-logstash

echo 40 >> /var/log/tester.log

Log entries 37 and 38 will not end up in Elasticsearch as RabbitMQ is down.

Log entries 39 not end up in Elasticsearch as Logstash is unable to reconnect to the running RabbitMQ.

Then after restarting logstash log entry 40 will make it to Elasticsearch as everything is up and connected.

Log entries 37, 38 and 39 will not be resent as the since_db file location for /var/log/tester.log has incremented (even though the messages did not succeed).

Here's a sample of a logstash output using --debug where we can see the since_db metadata updating from the file input (this is while the connection to RabbitMQ is down)

/host/var/log/tester.log: file grew, old size 144, new size 147 {:level=>:debug, :file=>"filewatch/watch.rb", line=>"96", :method=>"each"} Received line {:path=>"/host/var/log/tester.log", :text=>"46", level=>:debug, :file=>"logstash/inputs/file.rb", :line=>"137", :method=>"run"}

I guess it gets quite complicated - the file watcher should not increment the since_db metadata on file location unless there's an ack (success) from RabbitMQ.

— Reply to this email directly or view it on GitHub https://github.com/logstash-plugins/logstash-output-rabbitmq/issues/9#issuecomment-131283151 .

rudijs commented 9 years ago

@andrewvc Thanks Andrew.

Let me know if I can help out with testing - I don't think I can help with patching the code, but for testing stuff out I'm keen.

Cheers.

andrewvc commented 9 years ago

@rudijs @iemem15 I'm realizing now that you guys are likely using an old version of this plugin. It's been mostly rewritten and is now at version '1.1.0' . The new version has improved reconnection logic. Can you guys try upgrading the plugin via bin/plugin update logstash-output-rabbitmq and report back?

Sorry I missed that detail this weekend, but I believe that should clear up any issues.

andrewvc commented 9 years ago

@rudijs @iemem15 as a note, if you're using the logstash-rabbitmq-input plugin you'll need to upgrade that simultaneously as they both depend on a newer version of the march_hare rabbitmq gem.

iemem15 commented 9 years ago

I have uodated i will test it.

Thanks. El ago 17, 2015 12:10 PM, "Andrew Cholakian" notifications@github.com escribió:

@rudijs https://github.com/rudijs @iemem15 https://github.com/iemem15 I'm realizing now that you guys are likely using an old version of this plugin. It's been mostly rewritten and is now at version '1.1.0' . The new version has improved reconnection logic. Can you guys try upgrading the plugin via bin/plugin update logstash-output-rabbitmq and report back?

Sorry I missed that detail this weekend, but I believe that should clear up any issues.

— Reply to this email directly or view it on GitHub https://github.com/logstash-plugins/logstash-output-rabbitmq/issues/9#issuecomment-131892413 .

rudijs commented 9 years ago

@andrewvc

I'm using Docker with the offical image version logstash:1.5.2.

There's an SSL issue with v1.5.3 that currently prevents me from using that version.

That issue is here: https://github.com/elastic/logstash/issues/3657

I'm using both logstash-input-rabbitmq and logstash-output-rabbitmq.

Updating those on 1.5.2 gets me version 1.0.1, not 1.1.0 that you've referred too.

It is 1.1.0 that's required yes?

root@20a58a052702:/opt/logstash/bin# ./plugin update logstash-input-rabbitmq
Updating logstash-input-rabbitmq
Updated logstash-codec-json 1.0.0 to 1.0.1

root@20a58a052702:/opt/logstash/bin# ./plugin update logstash-output-rabbitmq
Updating logstash-output-rabbitmq
Updated logstash-output-rabbitmq 1.0.0 to 1.0.1
andrewvc commented 9 years ago

@rudijs apologies, the instructions I gave won't work on 1.5.2 because both the input and output gems share march_hare versions. The dependency manager requires these plugins to be upgraded simultaneously.

Try executing bin/plugin update logstash-output-rabbitmq logstash-input-rabbitmq .

I tested this locally against 1.5.2, it should fix your issues.

andrewvc commented 9 years ago

@iemem15 the advice I gave above also should apply to you, please run bin/plugin update logstash-output-rabbitmq logstash-input-rabbitmq, you must upgrade both plugins simultaneously.

rudijs commented 9 years ago

@andrewvc OK great, the updated instructions work sweet.

root@20a58a052702:/opt/logstash/bin# ./plugin update logstash-output-rabbitmq logstash-input-rabbitmq
Updating logstash-output-rabbitmq, logstash-input-rabbitmq
Updated logstash-input-rabbitmq 1.0.0 to 1.1.0
Updated logstash-output-rabbitmq 1.0.1 to 1.1.0

I'll report my findings shortly...

rudijs commented 9 years ago

OK .. here's my findings .. overall pretty good but might not yet be ideal.

I'm using both logstash-output-rabbitmq and logstash-input-rabbitmq.

The docker container names I'm using (from offical Docker images) below are prefixed with 'rsm-'.

Note: please excuse the low quality screen caps, not sure what's up with my software there.

1) Start Docker logstash:1.5.2

I have: 2 connections, 2 channels - the connected agents are using RabbitMQ clients version 3.3.4

1

selection_003

selection_004

2) Upgrade rabbitmq plugins:

vagrant@vbox:/var/log$ sudo docker exec rsm-logstash /opt/logstash/bin/plugin update logstash-output-rabbitmq logstash-input-rabbitmq
Updating logstash-output-rabbitmq, logstash-input-rabbitmq
Updated logstash-codec-json 1.0.0 to 1.0.1
Updated logstash-input-rabbitmq 1.0.0 to 1.1.0
Updated logstash-output-rabbitmq 1.0.0 to 1.1.0

3) Restart logstash to load the updated plugins

vagrant@vbox:/var/log$ sudo docker stop rsm-logstash
rsm-logstash
vagrant@vbox:/var/log$ sudo docker start rsm-logstash                                                                                                                         
rsm-logstash

After a few moments I'm back to 2 connections and 2 channels - the RabbitMQ client is now version 3.5.4

selection_006

4) Testing file input on /var/log/tester.log

vagrant@vbox:/var/log$ echo 101 >> tester.log                                                                                                                                 
vagrant@vbox:/var/log$ echo 102 >> tester.log 

This works fine, the new log entries got into the ELK pipeline (via rabbitmq input and output) and appear onscreen in Kibana.

4) Stop RabbitMQ, add to tester.log, Start RabbitMQ

vagrant@vbox:/var/log$ sudo docker stop rsm-rabbitmq
rsm-rabbitmq
vagrant@vbox:/var/log$ echo 103 >> tester.log                                                                                                                                 
vagrant@vbox:/var/log$ echo 104 >> tester.log                                                                                                                                 
vagrant@vbox:/var/log$ sudo docker start rsm-rabbitmq
rsm-rabbitmq

Looks good - Logstash reconnects to Rabbitmq and the log messages 103 and 104 appear in Kibana.

However there are now 3 connections to RabbitMQ.

selection_007

5) Stop RabbitMQ and Start RabbitMQ

vagrant@vbox:/var/log$ sudo docker stop rsm-rabbitmq rsm-rabbitmq vagrant@vbox:/var/log$ sudo docker start rsm-rabbitmq rsm-rabbitmq

4 connections to RabbitMQ.

selection_008

6) Stop RabbitMQ and Start RabbitMQ

vagrant@vbox:/var/log$ sudo docker stop rsm-rabbitmq rsm-rabbitmq vagrant@vbox:/var/log$ sudo docker start rsm-rabbitmq rsm-rabbitmq

5 connections to RabbitMQ.

selection_009

7) Stop RabbitMQ and Start RabbitMQ

vagrant@vbox:/var/log$ sudo docker stop rsm-rabbitmq rsm-rabbitmq vagrant@vbox:/var/log$ sudo docker start rsm-rabbitmq rsm-rabbitmq

6 connections to RabbitMQ.

selection_010

Summary)

I'll leave it run as is for a while and see if all those connections remain in place.

There are no non received log entries from the file input.

Logstash reconnects OK but seems to retain the previous connection plus the new connection.

There is no 'infinite' new connection for each failed reconnection as was the case previously.

Maybe it's something specific to my configuration?

Here's the rabbitmq input and output configs I use:

input {
    rabbitmq {
        host => "192.168.33.10"
        user => "rsm-logstash"
        password => "***********"
        vhost => "rsm"
        port => 5672
        queue => "logstash-queue"
        durable => true
        exchange => "logstash-rabbitmq"
        key => "logstash"
        threads => 1
        prefetch_count => 50
        tags => [ "rabbitmq-input" ]
    }
}
output {
    if "rabbitmq-input" in [tags] {
        elasticsearch {
            host => ["rsm-elasticsearch"]
            codec => "plain"
            workers => 1
            index => "logstash-%{+YYYY.MM.dd}"
            manage_template => true
            template_name => "logstash"
            template_overwrite => false
            port => 9200
            flush_size => 100
            idle_flush_time => 1
            protocol => "http"
        }
    }
    else {
        rabbitmq {
            host => "192.168.33.10"
            user => "rsm-logstash"
            password => "*******"
            vhost => "rsm"
            exchange => "logstash-rabbitmq"
            exchange_type => "direct"
            key => "logstash"
            durable => true
            persistent => true
            workers => 1
        }
    }
}

Overall my "RELK" pipeline seems to work fine all around.

Message are input and output OK and Kibana reports them.

When testing failure events, ie. turning RabbitMQ on/off, is the only oddness I'm seeing.

andrewvc commented 9 years ago

@iemem15 thanks for the research. That is definitely not acceptable behavior wrt connection leaking. I'll take a look at this today.

andrewvc commented 9 years ago

@iemem15 @rudijs I've pushed a fix up as a PR. You can test this out on your local logstash by running ( in the logstash root directory )

sed -i -e 's/gem "logstash-output-rabbitmq"/gem "logstash-output-rabbitmq", :git => "git@github.com:andrewvc\/logstash-output-rabbitmq.git", :branch => "fix_connection_duplication"/' Gemfile
bin/plugin install --no-verify
rudijs commented 9 years ago

@andrewvc

Got an install error, permissions from the git repo:

1)

/opt/logstash# grep rabbitmq Gemfile
gem "logstash-input-rabbitmq"
gem "logstash-output-rabbitmq", :git => "git@github.com:andrewvc/logstash-output-rabbitmq.git", :branch => "fix_connection_duplication"

2)

/opt/logstash# bin/plugin install --no-verify
Installing...
Error Bundler::Source::Git::GitCommandError, retrying 1/10
Git error: command `git clone 'git@github.com:andrewvc/logstash-output-rabbitmq.git' "/opt/logstash/vendor/bundle/jruby/1.9/cache/bundler/git/logstash-output-rabbitmq-90b0422022257ce5fb2a1ef438f4659ef3bd12c7" --bare --no-hardlinks --quiet` in directory /opt/logstash has failed.

3) So I tried to git clone command manually without the --quiet flag

git clone git@github.com:andrewvc/logstash-output-rabbitmq.git /opt/logstash/vendor/bundle/jruby/1.9/cache/bundler/git/logstash-output-rabbitmq-90b0422022257ce5fb2a1ef438f4659ef3bd12c7 --bare --no-hardlinks

Cloning into bare repository '/opt/logstash/vendor/bundle/jruby/1.9/cache/bundler/git/logstash-output-rabbitmq-90b0422022257ce5fb2a1ef438f4659ef3bd12c7'...
Warning: Permanently added the RSA host key for IP address '192.30.252.128' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
andrewvc commented 9 years ago

@iemem15 that's my bad, that's my private URL with push/pull try subbing out the git url for 'https://github.com/andrewvc/logstash-output-rabbitmq.git'

Hoping to issue a new release soon once this gets through code review

rudijs commented 9 years ago

@andrewvc Yep that works all good now.

Running the updates are described here, then stop/starting RabbitMQ and I see 2 connections only always (input and output in my use case).

Looking forward to seeing this released as soon as the team is able.

Many thanks for your attention and follow up on this one.

Cheers!

andrewvc commented 9 years ago

Fixed and released as 1.1.1 . Thanks @rudijs and @iemem15 !

rudijs commented 9 years ago

@andrewvc Hey Andrew,

Just of note .. I upgraded from Docker Logstash 1.5.2 to 1.5.4-1 but the updated rabbitmq plugins still need a manual update.

/opt/logstash/bin/plugin list --verbose | grep rabbit
logstash-input-rabbitmq (1.1.0)
logstash-output-rabbitmq (1.1.1)
/opt/logstash/bin/plugin update logstash-output-rabbitmq logstash-input-rabbitmq
Updating logstash-output-rabbitmq, logstash-input-rabbitmq
Updated logstash-input-rabbitmq 1.1.0 to 1.1.1
Updated logstash-output-rabbitmq 1.1.1 to 1.1.2

No big issue, just that my configuration management scripts need to continue to apply the plugin update.

Hopefully it'll all be bundled up to date with logstash v1.5.5

Cheers.

andrewvc commented 9 years ago

@rudijs I'll coordinate with @suyograo to make sure this makes it into our nextrelease

rudijs commented 9 years ago

@andrewvc Hi Andrew,

I think still related to this issue .. today doing an install of Docker Logstash 1.5.2 and then upgrading the RabbitMQ plugins I get this error:

sudo docker exec rsm-logstash /opt/logstash/bin/plugin update logstash-output-rabbitmq logstash-input-rabbitmq
You are updating logstash-output-rabbitmq to a new version 3.0.0, which may not be compatible with 1.0.0. are you sure you want to proceed (Y/N)?
ERROR: Updated Aborted, message: undefined method `strip' for nil:NilClass

I'm still using 1.5.2 until 1.5.5 gets released.

Anyways this error borks the plugin upgrade completely (and halts my chef-client run - ouch).

If I go manually into the runing logstash container I can run the plugin update command and it completes OK:

root@37b084dafd7f:/# /opt/logstash/bin/plugin update logstash-output-rabbitmq logstash-input-rabbitmq
You are updating logstash-output-rabbitmq to a new version 3.0.0, which may not be compatible with 1.0.0. are you sure you want to proceed (Y/N)?
Y
You are updating logstash-input-rabbitmq to a new version 3.0.0, which may not be compatible with 1.0.0. are you sure you want to proceed (Y/N)?
Y
Updating logstash-output-rabbitmq, logstash-input-rabbitmq
Updated logstash-codec-json 1.0.0 to 1.1.0
Updated logstash-input-rabbitmq 1.0.0 to 1.1.1
Updated logstash-output-rabbitmq 1.0.0 to 1.1.2

The update usage options don't seem to indicate that I can auto answer Y.

Do you know if its possible to work around this issue?

root@37b084dafd7f:/# /opt/logstash/bin/plugin update --help                                             
Usage:
    bin/plugin update [OPTIONS] [PLUGIN] ...

Parameters:
    [PLUGIN] ...                  Plugin name(s) to upgrade to latest version

Options:
    -h, --help                    print help
scbunn commented 9 years ago

Still getting this error with Logstash 1.5.4-1 (official docker image). Upgrading the plugin wants to use version 3.xx. At this point this seems unusable to me and has halted my entire pipeline.