uken / fluent-plugin-elasticsearch

Apache License 2.0
891 stars 310 forks source link

deadlock occurred when we restart elasticsearch 7 #587

Closed wyukawa closed 5 years ago

wyukawa commented 5 years ago

Problem

We send logs our elasticsearch with fluent-plugin-elasticsearch but deadlock error occurred when we restarted our elasticsearch.

I'm not sure this problem is related to fluent-plugin-elasticsearch or elasticsearch-ruby.

This error also occurred in fluent-plugin-elasticsearch 2.11.10.

At that time, I thought this was related to elasticsearch-ruby. So I filed https://github.com/elastic/elasticsearch-ruby/issues/656

At least, we don't have the problem when we use fluent-plugin-elasticsearch 2.11.10 and Elasticsearch 6.7.1

2019-05-13 14:32:19 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-05-13 14:32:20 +0900 chunk="588be3d16914130eecf307cf6625901d" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:758:in `rescue in send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:733:in `send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:628:in `block in write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `each'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1425:in `flush_thread_run'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:454:in `block (2 levels) in start'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-05-13 14:32:45 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-05-13 14:32:46 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:45 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:46 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-05-13 14:32:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:46 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-05-13 14:32:52 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:52 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-05-13 14:33:00 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:52 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:00 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-05-13 14:33:17 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:00 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:17 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-05-13 14:33:51 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:17 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:51 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-05-13 14:34:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:51 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:34:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=8 next_retry_seconds=2019-05-13 14:36:54 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:34:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:36:54 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=9 next_retry_seconds=2019-05-13 14:41:29 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:36:54 +0900 [warn]: #0 suppressed same stacktrace

Steps to replicate

data flow is here kafka -> kafka-fluentd-consumer -> fluend v1 -> fluent-plugin-elasticsearch -> elasticsearch

fluentd conf

  <match {...}.**>
    @type elasticsearch
    @id ...
    hosts ...
    logstash_format true
    logstash_prefix "${tag}"
    utc_index false
    include_tag_key true
    tag_key "tag_key"
    request_timeout 30s
    resurrect_after 0
    reload_on_failure true
    reload_connections false
    http_backend typhoeus
    type_name "_doc"
    bulk_message_request_threshold -1
    <buffer tag>
      @type "file"
      flush_at_shutdown true
      chunk_limit_size 8m
      queue_limit_length 512
      flush_interval 1s
      flush_thread_count 10
      overflow_action drop_oldest_chunk
    </buffer>
  </match>

Expected Behavior or What you need to ask

No deadlock

If deadlock occurs, fluent-plugin-elasticsearch retries but fails. So buffer queue length increases.

Using Fluentd and ES plugin versions

gem "fluentd", "1.4.2" gem "oj", "3.7.12"

gem "fluent-plugin-ping-message", "1.0.0" gem "fluent-plugin-record-reformer", "0.9.1" gem "fluent-plugin-suppress", "1.1.0" gem 'fluent-plugin-elasticsearch', :git => 'https://github.com/uken/fluent-plugin-elasticsearch.git', :branch => " configurable-split_request_p" gem "typhoeus", "1.3.1" gem "fluent-plugin-prometheus", "1.3.0"


* ES version (optional)
7.0.1
* ES template(s) (optional)
wyukawa commented 5 years ago

This problem doesn't happen in fluent-plugin-elasticsearch 3.5.2, elasticsearch ruby 7.1.0, Elasticsearch 7.1.1

Z0T0 commented 5 years ago

Please increase 'chunk_limit_size'

cosmo0920 commented 5 years ago

@Z0T0 You shouldn't write your issue report in non-relevant issue. Please open a new issue instead.