deadlock occurred when we restart elasticsearch 7

Problem

We send logs our elasticsearch with fluent-plugin-elasticsearch but deadlock error occurred when we restarted our elasticsearch.

I'm not sure this problem is related to fluent-plugin-elasticsearch or elasticsearch-ruby.

This error also occurred in fluent-plugin-elasticsearch 2.11.10.

At that time, I thought this was related to elasticsearch-ruby. So I filed https://github.com/elastic/elasticsearch-ruby/issues/656

At least, we don't have the problem when we use fluent-plugin-elasticsearch 2.11.10 and Elasticsearch 6.7.1

2019-05-13 14:32:19 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-05-13 14:32:20 +0900 chunk="588be3d16914130eecf307cf6625901d" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:758:in `rescue in send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:733:in `send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:628:in `block in write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `each'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1425:in `flush_thread_run'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:454:in `block (2 levels) in start'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-05-13 14:32:45 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-05-13 14:32:46 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:45 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:46 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-05-13 14:32:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:46 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-05-13 14:32:52 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:52 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-05-13 14:33:00 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:52 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:00 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-05-13 14:33:17 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:00 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:17 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-05-13 14:33:51 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:17 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:51 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-05-13 14:34:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:51 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:34:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=8 next_retry_seconds=2019-05-13 14:36:54 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:34:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:36:54 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=9 next_retry_seconds=2019-05-13 14:41:29 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:36:54 +0900 [warn]: #0 suppressed same stacktrace

Steps to replicate

data flow is here kafka -> kafka-fluentd-consumer -> fluend v1 -> fluent-plugin-elasticsearch -> elasticsearch

fluentd conf

  <match {...}.**>
    @type elasticsearch
    @id ...
    hosts ...
    logstash_format true
    logstash_prefix "${tag}"
    utc_index false
    include_tag_key true
    tag_key "tag_key"
    request_timeout 30s
    resurrect_after 0
    reload_on_failure true
    reload_connections false
    http_backend typhoeus
    type_name "_doc"
    bulk_message_request_threshold -1
    <buffer tag>
      @type "file"
      flush_at_shutdown true
      chunk_limit_size 8m
      queue_limit_length 512
      flush_interval 1s
      flush_thread_count 10
      overflow_action drop_oldest_chunk
    </buffer>
  </match>

Expected Behavior or What you need to ask

No deadlock

If deadlock occurs, fluent-plugin-elasticsearch retries but fails. So buffer queue length increases.

Using Fluentd and ES plugin versions

OS version CentOS Linux release 7.6.1810 (Core)
Bare Metal or within Docker or Kubernetes or others? Bare Metal
Fluentd v0.12 or v0.14/v1.0 1.4.2

ES plugin 3.x.y/2.x.y or 1.x.y 3.5.0

2019-05-13 14:11:17 +0900 [info]: starting fluentd-1.4.2 pid=105220 ruby="2.6.3"
...
2019-05-13 14:11:17 +0900 [info]: gem 'fluentd' version '1.4.2'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-elasticsearch' version '3.5.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-ping-message' version '1.0.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-prometheus' version '1.3.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-record-reformer' version '0.9.1'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-suppress' version '1.1.0'
...

Gemfile


source 'https://rubygems.org'

gem "fluentd", "1.4.2" gem "oj", "3.7.12"

gem "fluent-plugin-ping-message", "1.0.0" gem "fluent-plugin-record-reformer", "0.9.1" gem "fluent-plugin-suppress", "1.1.0" gem 'fluent-plugin-elasticsearch', :git => 'https://github.com/uken/fluent-plugin-elasticsearch.git', :branch => " configurable-split_request_p" gem "typhoeus", "1.3.1" gem "fluent-plugin-prometheus", "1.3.0"


* ES version (optional)
7.0.1
* ES template(s) (optional)

uken / fluent-plugin-elasticsearch