Fluentd stopped sending data to ES for somewhile.

hustshawn commented 5 years ago

Problem

I used the fluentd with your plugin to collect logs from docker containers and send to ES. It works at the very begining. But later, the ES unable to recieve the logs from fluentd. The ES is always running fine. And I find there is no indices of the new day(eg. fluentd-20190110, only the old indice 20190109 exist) in the ES.

However, if I restart my docker containers with fluentd, it can start sending logs to ES.

...

Steps to replicate

The fluentd config

# fluentd/conf/fluent.conf
<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>
<match *.**>
  @type copy
  <store>
    @type elasticsearch
    host my-es-host
    port 9200
    logstash_format true
    logstash_prefix fluentd
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 5s
  </store>
  <store>
    @type stdout
  </store>
</match>

Expected Behavior or What you need to ask

The fluentd should keep sending logs to ES.

Using Fluentd and ES plugin versions

OS version
Bare Metal or within Docker or Kubernetes or others? Docker
Fluentd v0.12 or v0.14/v1.0
- paste result of fluentd --version or td-agent --version v1.3.2-1.0
ES plugin 2.x.y or 1.x.y
- paste boot log of fluentd or td-agent
- paste result of fluent-gem list, td-agent-gem list or your Gemfile.lock
ES version (optional) 6.5.4

cosmo0920 commented 5 years ago

Could you provide your Fluentd docker log?

<match *.**>

The above settings is very dangerous. This blackhole pattern causes flood of declined log: https://github.com/uken/fluent-plugin-elasticsearch#declined-logs-are-resubmitted-forever-why

hustshawn commented 5 years ago

Hi @cosmo0920 , The fluentd logs are looks like below

fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: Detected ES 6.x: ES 7.x will only accept `_doc` in type_name.
fluentd_1        | 2019-01-09 03:15:52 +0000 [warn]: To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'.
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: using configuration file: <ROOT>
fluentd_1        |   <source>
fluentd_1        |     @type forward
fluentd_1        |     port 24224
fluentd_1        |     bind "0.0.0.0"
fluentd_1        |   </source>
fluentd_1        |   <match *.**>
fluentd_1        |     @type copy
fluentd_1        |     <store>
fluentd_1        |       @type "elasticsearch"
fluentd_1        |       host my-es-host
fluentd_1        |       port 9200
fluentd_1        |       logstash_format true
fluentd_1        |       logstash_prefix "fluentd"
fluentd_1        |       logstash_dateformat "%Y%m%d"
fluentd_1        |       include_tag_key true
fluentd_1        |       type_name "access_log"
fluentd_1        |       tag_key "@log_name"
fluentd_1        |       flush_interval 1s
fluentd_1        |       <buffer>
fluentd_1        |         flush_interval 1s
fluentd_1        |       </buffer>
fluentd_1        |     </store>
fluentd_1        |     <store>
fluentd_1        |       @type "stdout"
fluentd_1        |     </store>
fluentd_1        |   </match>
fluentd_1        | </ROOT>
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: starting fluentd-1.3.2 pid=5 ruby="2.5.2"
fluentd_1        | 2019-01-09 03:15:52 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '3.0.1'
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: gem 'fluentd' version '1.3.2'
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: adding match pattern="*.**" type="copy"
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 Detected ES 6.x: ES 7.x will only accept `_doc` in type_name.
fluentd_1        | 2019-01-09 03:15:53 +0000 [warn]: #0 To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'.
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: adding source type="forward"
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 starting fluentd worker pid=13 ppid=5 worker=0
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 listening port port=24224 bind="0.0.0.0"
fluentd_1        | 2019-01-09 03:15:53 +0000 [info]: #0 fluentd worker is now running worker=0
fluentd_1        | 2019-01-09 03:15:53.601732394 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}

....

cosmo0920 commented 5 years ago

Umm..., could you share fluentd error log between from 2019-01-10 2:00 to 2019-01-10 11:00 ?

Shared log is booting log. It just says that Fluentd was launched normally.

hustshawn commented 5 years ago

@cosmo0920 I find something like this

[fluentd_1        |[0m 2019-01-10 02:16:45 +0000 [warn]: #0 failed to flush the buffer. retry_time=15 next_retry_seconds=2019-01-10 07:21:51 +0000 chunk="57f0d689aeefe7b1ef1da592fed4d444" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"my-es-host\", :port=>9200, :scheme=>\"http\"}): Connection refused - connect(2) for 172.18.0.2:9200 (Errno::ECONNREFUSED)"
[fluentd_1        |[0m   2019-01-10 02:16:45 +0000 [warn]: #0 suppressed same stacktrace
[fluentd_1        |[0m 2019-01-10 02:16:45.424613201 +0000 fluent.warn: {"retry_time":15,"next_retry_seconds":"2019-01-10 07:21:51 +0000","chunk":"57f0d689aeefe7b1ef1da592fed4d444","error":"#<Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure: could not push logs to Elasticsearch cluster ({:host=>\"my-es-host\", :port=>9200, :scheme=>\"http\"}): Connection refused - connect(2) for 172.18.0.2:9200 (Errno::ECONNREFUSED)>","message":"failed to flush the buffer. retry_time=15 next_retry_seconds=2019-01-10 07:21:51 +0000 chunk=\"57f0d689aeefe7b1ef1da592fed4d444\" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error=\"could not push logs to Elasticsearch cluster ({:host=>\\\"my-es-host\\\", :port=>9200, :scheme=>\\\"http\\\"}): Connection refused - connect(2) for 172.18.0.2:9200 (Errno::ECONNREFUSED)\""}

cosmo0920 commented 5 years ago

It seems that ES plugin cannot push events due to ECONNREFUSED. This error is from network stack. Could you check docker networking settings or ES side log?

hustshawn commented 5 years ago

@cosmo0920 My ES is setup with AWS EC2, and the networking should be fine, without disconnect or DNS issue. I also find some extra logs just above previous logs.

^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:645:in `rescue in send_bulk'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:627:in `send_bulk'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:534:in `block in write'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `each'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `write'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin/output.rb:1123:in `try_flush'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin/output.rb:1423:in `flush_thread_run'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin/output.rb:452:in `block (2 levels) in start'
^[[36mfluentd_1        |^[[0m   2019-01-09 21:47:30 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.3.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

hustshawn commented 5 years ago

@cosmo0920 Here is more logs from ES

elasticsearch_1  | [2019-01-10T04:41:01,689][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,689][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,795][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,795][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,823][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,823][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,833][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,833][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,835][INFO ][o.e.c.r.a.AllocationService] [-utwWeF] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[fluentd-20190108][2]] ...]).
elasticsearch_1  | [2019-01-10T04:41:01,843][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:01,847][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:08,712][INFO ][o.e.c.m.MetaDataMappingService] [-utwWeF] [fluentd-20190110/j4oWJJa8Rla-l48sMgHLog] update_mapping [access_log]
elasticsearch_1  | [2019-01-10T04:41:08,724][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T04:41:08,724][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,832][INFO ][o.e.c.m.MetaDataMappingService] [-utwWeF] [fluentd-20190110/j4oWJJa8Rla-l48sMgHLog] update_mapping [access_log]
elasticsearch_1  | [2019-01-10T06:18:09,843][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,843][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,859][INFO ][o.e.c.m.MetaDataMappingService] [-utwWeF] [fluentd-20190110/j4oWJJa8Rla-l48sMgHLog] update_mapping [access_log]
elasticsearch_1  | [2019-01-10T06:18:09,867][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[fluentd-20190109/JvyIBQfkQZGjNEXy0you4A]] can not be imported as a dangling index, as index with same name already exists in cluster metadata
elasticsearch_1  | [2019-01-10T06:18:09,868][WARN ][o.e.g.DanglingIndicesState] [-utwWeF] [[.kibana_1/1rFuKeKfRDel1FPUWShc4w]] can not be imported as a dangling index, as index with same name already exists in cluster metadata

and actually, I have two nodes/host with same configuration that collect logs from my application server, do you think that should be a concern for this issue?

If it is true, is there any other way in the fluentd configuration to distinguish the logs collected from which node? like hostname or host ip as the metadata?

cosmo0920 commented 5 years ago

do you think that should be a concern for this issue?

It should check Docker networking. Bare metal environment might not cause networking issue. Here is the another case due to docker networking: https://github.com/uken/fluent-plugin-elasticsearch/issues/416

The above issue is also only occurred within docker not bare metal environment.

If it is true, is there any other way in the fluentd configuration to distinguish the logs collected from which node? like hostname or host ip as the metadata?

in_forward has the option adds hostname: https://docs.fluentd.org/v1.0/articles/in_forward#source_hostname_key

hustshawn commented 5 years ago

@cosmo0920 Thanks for you for your advice. But I have to use the fluentd in docker, and it looks like the issue still there. The services in my docker is always running well. It probably not the docker networking issue.

emmayang commented 5 years ago

Met similar issue, but I have the fluend deployed as a daemonset under kube-system namespace.

And I can confirm ES is running well all the time, since fluentd is only one of my logging sources, and other sources can work well and showing logs correctly in ES.

hustshawn commented 5 years ago

@emmayang Same issue on my kube platform.

cosmo0920 commented 5 years ago

Hmmm..., could you try typhoeus backend instead of excon? typhoeus can handle keep-alive by default. https://github.com/uken/fluent-plugin-elasticsearch#http_backend

twittyc commented 5 years ago

I'm also seeing this same issue when running fluentd with ES plugin in Kubernetes. I tried both backends and typhoeus didn't work at all, while the default backend would work on initial connection (fresh deploy) and then stop sending data almost immediately.

EDIT: I believe my issues were not from the ES plugin but performance tuning that I needed to do on Fluentd.

aaron1989041 commented 5 years ago

I have similar problems.I also have huge number for warnings as below: "failed to flush the buffer. retry_time=0 next_retry_seconds=2019-03-19 01:30:36 +0000 chunk="584686c3d47849db61228ea7e6f29bb5" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"es-cn-v0h10rbfl000kfon8..com\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): connect_write timeout reached"" when this error happens ,the only way is to restart the fluentd container.but then log gap happens.

ChSch3000 commented 5 years ago

Same problem here. I'm using fluentd-kubernetes-daemonset. Already opened an Issue here https://github.com/fluent/fluentd-kubernetes-daemonset/issues/280 After deployment the plugin works fine and ships all logs to ES. But after a few hours the plugin stops with following error:

2019-03-19 08:24:32 +0000 : #0 [out_es] failed to flush the buffer. retry_time=2810 next_retry_seconds=2019-03-19 08:25:05 +0000 chunk="5846b2b0d6d06c398eee3540256d465d" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableReque │
│ stFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elastic.xyz.com\", :port=>443, :scheme=>\"https\", :user=>\"elastic\", :password=>\"obfuscated\", :path=>\"\"}): connect_write timeout reached"

Only solution is to restart the pod. But this isnt' an acceptable solution,

cosmo0920 commented 5 years ago

Set reload_connections as false can help this issue? I launched docker-compose environment with https://github.com/fluent/fluentd/issues/2334#issue-422196534 settings but I didn't reproduce in my local environment. To reproduce this issue, we should handle a massive events?

bidiudiu commented 5 years ago

Set reload_connections as false can help this issue? I launched docker-compose environment with fluent/fluentd#2334 (comment) settings but I didn't reproduce in my local environment. To reproduce this issue, we should handle a massive events?

@cosmo0920, I'm afraid so...In my case, the hits reach 100000+ then the issue happens.

In fluentd, here's error info:

2019-03-20 02:07:53 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2019-03-20 02:07:54 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f880ef7f118"
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:249:in perform_request' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/http/faraday.rb:20:in perform_request'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/client.rb:128:in perform_request' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/elasticsearch-api-1.0.18/lib/elasticsearch/api/actions/bulk.rb:90:in bulk'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch.rb:353:in send_bulk' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch.rb:339:in write_objects'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/output.rb:490:in write' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/buffer.rb:354:in write_chunk'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/buffer.rb:333:in pop' 2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/output.rb:342:in try_flush'
2019-03-20 02:07:53 +0000 [warn]: /var/lib/gems/2.3.0/gems/fluentd-0.12.43/lib/fluent/output.rb:149:in `run'

I'll try 'reconnect_on_error true' and give feedback.

ChSch3000 commented 5 years ago

Set reload_connections as false can help this issue? I launched docker-compose environment with fluent/fluentd#2334 (comment) settings but I didn't reproduce in my local environment. To reproduce this issue, we should handle a massive events?

Maybe this is the solution for me. Set reload_connection to false, now it's working for about 18h without troubles. I will monitor it for the next few hours / days.

cosmo0920 commented 5 years ago

@bidiudiu @ChSch3000 Thank you for your issue confirmations and clarifications!

fluentd-kubernates-daemonset provides the following environment variable:

FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS (default: true)

This should be specified:

FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS=false

cosmo0920 commented 5 years ago

I've added FAQ for this situation. https://github.com/uken/fluent-plugin-elasticsearch/pull/564

Any lack of information to solve this issue?

bidiudiu commented 5 years ago

Thanks @cosmo0920. I add settings below and it works fine:

reconnect_on_error true
  reload_on_failure true
  reload_connections false

cosmo0920 commented 5 years ago

reconnect_on_error true
reload_on_failure true
reload_connections false

OK. Thanks for confirming, @bidiudiu ! I'll add more descriptions for this issue into FAQ.

dogzzdogzz commented 5 years ago

Can we change the default value of those settings for fluentd-kubernetes-daemonset ? I think everyone who uses fluentd-kubernetes-daemonset will encounter this issue easily ?

hustshawn commented 5 years ago

@dogzzdogzz if you are using helm to install, eg. helm upgrade --install logging-fluentd -f your-values.yml kiwigrid/fluentd-elasticsearch --namespace your-namespace, you can just modify the fluentd config in your-values.yml.

Part of my snippet looks like this,

  output.conf: |
    # Enriches records with Kubernetes metadata
    <filter kubernetes.**>
      @type kubernetes_metadata
    </filter>
    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      include_tag_key true
      type_name _doc
      host "#{ENV['OUTPUT_HOST']}"
      port "#{ENV['OUTPUT_PORT']}"
      scheme "#{ENV['OUTPUT_SCHEME']}"
      ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
      logstash_format true
      logstash_prefix "#{ENV['LOGSTASH_PREFIX']}"
      reload_connections false
      reconnect_on_error true
      reload_on_failure true
      slow_flush_log_threshold 25.0
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        flush_interval 5s
        flush_thread_count 4
        chunk_full_threshold 0.9
        # retry_forever
        retry_type exponential_backoff
        retry_timeout 1m
        retry_max_interval 30
        chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
        queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
        overflow_action drop_oldest_chunk
      </buffer>
    </match>

cosmo0920 commented 5 years ago

@dogzzdogzz The latest fluentd-kubernetes-daemonset includes the above settings by default.

darthchudi commented 5 years ago

Tried using the exact same config as https://github.com/uken/fluent-plugin-elasticsearch/issues/525#issuecomment-490724317 but the issue still persists. Fluentd stops shipping logs to Elasticsearch after some time.

amulyamalla commented 4 years ago

@cosmo0920 same issue persists , unable to send logs after few times , As per the observation , flunetd run absolutely fine till no restart , when pod get restarted problem occurs


2020-08-05 09:58:12 +0000 [warn]: [sample-service] failed to flush the buffer. retry_time=2 next_retry_seconds=2020-08-05 09:58:14 +0000 chunk="5ac1e67bde2f323981d71058390e5ebe" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"192.168.0.15\", :port=>9500, :scheme=>\"http\", :user=>\"fluentd\", :password=>\"obfuscated\"}, {:host=>\"192.168.0.16\", :port=>9500, :scheme=>\"http\", :user=>\"fluentd\", :password=>\"obfuscated\"}): read timeout reached"

**Resoultion :** 
    the only solution i found that forcefully restart fluend pod , hence new container send logs immediately

cosmo0920 commented 4 years ago

You should add simple sniffer loading code and specify loaded simple sniffer class: https://github.com/uken/fluent-plugin-elasticsearch#sniffer-class-name Default sniffer class causes this issue.

hari819 commented 3 years ago

You should add simple sniffer loading code and specify loaded simple sniffer class: https://github.com/uken/fluent-plugin-elasticsearch#sniffer-class-name Default sniffer class causes this issue.

did this work , to solve the "failed to flush the buffer" error , if so could you post the configuration , i have tried running fluentd with sniffer class , but i still get the same error

Thanks,

Brian-McM commented 3 years ago

Yes me too, I've loaded the sniffer class and it's still giving me that error. I'm using version 4.0.5, but I'm getting that error as soon as the fluentd pods restart, there's no grace period where this is succeeding at sending logs. ~~Initially it was working though, and the scheme is set to https~~ I double checked and it was actually sending successfully on restart.

mokhos commented 2 years ago

Same issue here. Did anyone found a concrete solution? I tried these, but no luck.

reconnect_on_error true
reload_on_failure true
reload_connections false

Also the sniffer_class solution doesn't work for me at all and throws an error.

mokhos commented 2 years ago

So I found the solution 4days ago and I've been testing it ever since. So after the change I made, my fluentd didn't stopped or crashed sending logs to elasticsearch.

My solution was to change the buffer path in a way I saw in fluentd documentation.

path /opt/bitnami/fluentd/logs/buffers/logs.*.buffer

instead of

path /opt/bitnami/fluentd/logs/buffers/logs.buffer

This worked for me.

hari819 commented 2 years ago

So I found the solution 4days ago and I've been testing it ever since. So after the change I made, my fluentd didn't stopped or crashed sending logs to elasticsearch.

My solution was to change the buffer path in a way I saw in fluentd documentation.

path /opt/bitnami/fluentd/logs/buffers/logs.*.buffer

instead of

path /opt/bitnami/fluentd/logs/buffers/logs.buffer

This worked for me.

@mokhos , please could you let us know the version of fluentd / fluentd-plugin-elasticsearch , you were using to test this configuration ?

mokhos commented 2 years ago

So I found the solution 4days ago and I've been testing it ever since. So after the change I made, my fluentd didn't stopped or crashed sending logs to elasticsearch. My solution was to change the buffer path in a way I saw in fluentd documentation. path /opt/bitnami/fluentd/logs/buffers/logs.*.buffer instead of path /opt/bitnami/fluentd/logs/buffers/logs.buffer This worked for me.

@mokhos , please could you let us know the version of fluentd / fluentd-plugin-elasticsearch , you were using to test this configuration ?

I have used below versions:

2022-03-30 11:56:59 +0000 [info]: gem 'fluentd' version '1.14.5'

2022-03-30 11:56:59 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.1.5'

srujith07 commented 2 years ago

Hi @cosmo0920, I am also facing the same issue , it will be helpful if you share your solution with me

If i restarted my td-agent.service the logs are coming for sometime in the elasticsearch after 3-6 mins they are getting stopped automatically and no error is showing in td-agent logs.

Here is my configuration:

<match "mytopicname">
      @type elasticsearch
      hosts        "my_IP_address_here"
      ca_file       "my_path_here"
      client_cert  "my_path_here" 
      client_key  " my_path_here" 
      ssl_verify  true
      user   "my_username"
      password "my_password"
      logstash_format true
      logstash_prefix "my_index_name"
      logstash_date_format  my_date_format
      time_key_format  "my time format"
      type_name  fluentd
      log_es_400_reason true
      include_timestamp true
      reconnect_on_error true
      reload_on_failure true
      reload_connections false
     <buffer>
          @type file
           path     "my path here"
           chunk_limit_size 10m
     </buffer>
</match>

also tried.

<match "mytopicname">
      @type elasticsearch
      hosts        "my_IP_address_here"
      ca_file       "my_path_here"
      client_cert  "my_path_here" 
      client_key  " my_path_here" 
      ssl_verify  true
      user   "my_username"
      password "my_password"
      logstash_format true
      logstash_prefix "my_index_name"
      logstash_date_format  my_date_format
      time_key_format  "my time format"
      type_name  fluentd
      log_es_400_reason true
      include_timestamp true
      reconnect_on_error true
      reload_on_failure true
      reload_connections false
      slow_flush_log_threshold  25.0
     <buffer>
          @type file
           path     "syslog.*.buffer"
           chunk_limit_size 50m
           flush_mode interval
           flush_interval  5s
           flush_thread_count 4
          overflow_action drop_oldest_chunk
          retry_timeout 1m
          retry_max_interval 30
          chunk_full_threshold 0.9
      </buffer>
</match>

Please help !!!!!

Note : The above configuration is not copy pasted ignore indentation

xgbt commented 10 months ago

Thanks @cosmo0920. I add settings below and it works fine:
reconnect_on_error true
  reload_on_failure true
  reload_connections false

It works for me

uken / fluent-plugin-elasticsearch