erroneous config prevents es-plugin from posting log to elasticsearch

yiwenshao commented 5 years ago

Problem

I first used fluend config file with one matching section using elasticsearch-plugin to post log and it worked fine. After that, I decided to add one more matching section with a different tag and a different elasticsearch ip:port. The second elasticsearch did not start properly so I got connection refused. However, the first matching section stopped posting log during this time.

Fluentd just printed Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached. The posting process seemed to be blocked by this 'connection refused'.

Steps to replicate

first create a config file with one matching section, write some log to /logtmp/log.log and the plugin is able to post log to es

@include conf.d/*
<system>
  workers 1
  log_level debug
</system>

<worker 0>
<source>
  @type tail
  path /logtmp/log.log
  pos_file /var/log/td-agent/pos
  tag testlogtwo.*
  format json
  keep_time_key true
  read_from_head true
  rotate_wait 10
  skip_refresh_on_startup true
  refresh_interval 10
</source>

  <match testlogtwo.**>
      @type elasticsearch
      hosts 172.2.25.1:9200
      index_name depindex1
      type_name _doc
      <buffer>
        flush_mode interval
        retry_type exponential_backoff
        total_limit_size 32MB
        chunk_limit_size 4MB
        chunk_full_threshold 0.8
        @type file
        path /var/log/td-agent/buffer/esbuf1
        overflow_action block
        flush_interval 8s
        flush_thread_burst_interval 0.02
      </buffer>
  </match>

</worker>

then edit the file and add a matching section with a different tag, this es does not work(use a wrong port number to simulate the situation)

@include conf.d/*
<system>
  workers 1
  log_level debug
</system>

<worker 0>
<source>
  @type tail
  path /logtmp/log.log
  pos_file /var/log/td-agent/pos
  tag testlogtwo.*
  format json
  keep_time_key true
  read_from_head true
  rotate_wait 10
  skip_refresh_on_startup true
  refresh_interval 10
</source>

  <match testlogtwo.**>
      @type elasticsearch
      hosts 172.2.25.1:9200
      index_name depindex1
      type_name _doc
      <buffer>
        flush_mode interval
        retry_type exponential_backoff
        total_limit_size 32MB
        chunk_limit_size 4MB
        chunk_full_threshold 0.8
        @type file
        path /var/log/td-agent/buffer/esbuf1
        overflow_action block
        flush_interval 8s
        flush_thread_burst_interval 0.02
      </buffer>
  </match>

  <match testlog.**>
      @type elasticsearch
      hosts 92.18.2.7:920
      index_name depindex2
      type_name _doc
      <buffer>
        flush_mode interval
        retry_type exponential_backoff
        total_limit_size 32MB
        chunk_limit_size 4MB
        chunk_full_threshold 0.8
        @type file
        path /var/log/td-agent/buffer/esbuf2
        overflow_action block
        flush_interval 8s
        flush_thread_burst_interval 0.02
      </buffer>
  </match>

</worker>

at this time, fluentd keeps complaining Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached. Both of the two matching section stops posting log to es.

Expected Behavior or What you need to ask

logs matching the tag testlogtwo.** should be posted while logs matching the tag testlog.** waits for es connection retry. The elasticsearch that does not work should not interfere with the other one.(For this example, the second matching section is actually never used, while it still prevents the first matching section from working properly)

Using Fluentd and ES plugin versions

OS version:centos 7.2
Fluentd: v1.6.0
ES plugin: v3.5.2

cosmo0920 commented 5 years ago

logs matching the tag testlogtwo.** should be posted while logs matching the tag testlog.** waits for es connection retry. The elasticsearch that does not work should not interfere with the other one.(For this example, the second matching section is actually never used, while it still prevents the first matching section from working properly)

This is because these match section are running on the same worker. Could you try to separate them into different workers with multi works feature? https://docs.fluentd.org/deployment/multi-process-workers

yiwenshao commented 5 years ago

Thank you for your response. Separating it into another worker does solve the problem. However, as the number of input and output gets larger, it is impossible to give each configuration a separate fluentd worker. This seems to be fluentd's issue.

cosmo0920 commented 5 years ago

However, as the number of input and output gets larger, it is impossible to give each configuration a separate fluentd worker. This seems to be fluentd's issue.

We can also use <worker N-M> syntax in this case. https://docs.fluentd.org/deployment/multi-process-workers#less-than-worker-n-m-greater-than-directive

yiwenshao commented 5 years ago

I mean creating one worker for each configuration is expensive, and if we keep the number of workers small, then the configurations within one worker may still interfere with each other.
I will use the multi-woker feature for now, and perform some checking before writing the config files to work around this.

uken / fluent-plugin-elasticsearch