logstash-plugins / logstash-filter-kv

Apache License 2.0
17 stars 42 forks source link

kv filter gets stuck on single message, 100% CPU #86

Closed regulatre closed 5 years ago

regulatre commented 5 years ago

I have a pipeline with filebeat input (/var/log/messages on my lab syslog server), and the pipeline had kv filter running against every message. Many of the messages have K=V readings and metrics from various systems in my lab so it was useful. But every now and then logstash would jump to 100% CPU. top -H helped me pinpoint the pipeline, and within the pipeline I ruled things out one by one until I narrowed it down to kv. Later I noticed that when it's happening, if I systemctl restart logstash.service, the logstash-plain.log would record a series of very detailed errors about the kv thread that was stuck[2].

The syslog message that it appears to be chewing on (the line in the remote filebeat server's syslog file that immediately followed the last one received from the pipeline) is an ugly cruft[3], which must contain some character or sequence that triggers the kv filter to go into the locked state. I have captured the log message verbatim in a gist [3].

OS: CentOS 7.5.1804 VM, 4-CPU, 16G memory occured in both of the following versions of logstash: logstash-6.7.1-1.noarch logstash-6.7.2-1.noarch

[2] Errors from logstash-plain.log showing the kv thread stuck. This correlates with the 100% CPU consumption and I don't otherwise see these errors in the log.

[2019-07-25T03:27:12,434][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>223, "name"=>"[syslogs]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in `run'"}], ["LogStash::Filters::KV", {"prefix"=>"kv_", "source"=>"message", "id"=>"be9b7b970d5afb75eb7c87a8a87e0b13280bc676e4259cf68c0a86cd6e7c940f"}]=>[{"thread_id"=>222, "name"=>"[syslogs]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-filter-kv-4.3.0/lib/logstash/filters/kv.rb:555:in `scan'"}]}}
[2019-07-25T03:27:17,459][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>223, "name"=>"[syslogs]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in `run'"}], ["LogStash::Filters::KV", {"prefix"=>"kv_", "source"=>"message", "id"=>"be9b7b970d5afb75eb7c87a8a87e0b13280bc676e4259cf68c0a86cd6e7c940f"}]=>[{"thread_id"=>222, "name"=>"[syslogs]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-filter-kv-4.3.0/lib/logstash/filters/kv.rb:555:in `scan'"}]}}
[2019-07-25T03:27:22,485][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>223, "name"=>"[syslogs]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in `run'"}], ["LogStash::Filters::KV", {"prefix"=>"kv_", "source"=>"message", "id"=>"be9b7b970d5afb75eb7c87a8a87e0b13280bc676e4259cf68c0a86cd6e7c940f"}]=>[{"thread_id"=>222, "name"=>"[syslogs]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-filter-kv-4.3.0/lib/logstash/filters/kv.rb:555:in `scan'"}]}}
colinsurprenant commented 5 years ago

Thanks for the report @regulatre - you are probably affected by https://github.com/elastic/logstash/pull/10978 which prevented the timeout options for both grok and kv filter to work at all where a runaway regular expressions could hang the filter execution. Short of being able to upgrade, in the PR you will find workaround instructions.

Also, I would strongly suggest to update both grok and kv plugins to the latest versions which fixes other timeout handling related issues.

I will go ahead and close this issue, please feel free to reopen if needed.