Open adrdimitrov opened 3 years ago
Is it same cause with #885?
ruby="2.6.7"
It's an affected version for #885
Hmm seems similar:
But i don't get my pod restarted or anything else, anyway, how to update the ruby if it is deployed with the helm chart, or am I getting something wrong ?
I'm encountering a similar problem on a kubernetes AKS installation using fluentd deamonset. Configuration:
Fluent plugin configuration
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-concat' version '2.5.0'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-dedot_filter' version '1.0.0'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.14'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.1.5'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-json-in-json-2' version '1.0.2'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.9.4'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-parser-cri' version '0.1.1'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.2'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2022-03-02 09:03:34 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2022-03-02 09:03:34 +0000 [info]: gem 'fluentd' version '1.14.5'
Output Configuration
<match **>
@type elasticsearch_data_stream
@id out_es
@log_level "info"
include_tag_key true
request_timeout 15s
data_stream_name "logs-cluster-test"
host "my-elastic-cloud.com"
port 9243
path ""
scheme https
ssl_verify true
ssl_version TLSv1_2
user "elastic"
password xxxxxx
<buffer>
flush_thread_count 8
flush_interval 5s
chunk_limit_size 2M
queue_limit_length 32
retry_max_interval 30
retry_forever true
</buffer>
</match>
After some hours this error occurs:
2022-03-02 02:18:42 +0000 [warn]: #0 [out_es] failed to flush the buffer. retry_times=5 next_retry_time=2022-03-02 02:18:58 +0000 chunk="5d932e3443b0395a4f70f649ed37c03e" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster (logs-cluster-test): connect_write timeout reached"
2022-03-02 02:18:42 +0000 [warn]: #0 suppressed same stacktrace
2022-03-02 02:18:42 +0000 [warn]: #0 [out_es] failed to flush the buffer. retry_times=5 next_retry_time=2022-03-02 02:18:58 +0000 chunk="5d932e52a843cbf6d6be5aa934c9f0be" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster (logs-cluster-test): connect_write timeout reached"
2022-03-02 02:18:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-elasticsearch-5.1.5/lib/fluent/plugin/out_elasticsearch_data_stream.rb:249:in 'rescue in write'
2022-03-02 02:18:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-elasticsearch-5.1.5/lib/fluent/plugin/out_elasticsearch_data_stream.rb:243:in 'write'
2022-03-02 02:18:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1179:in 'try_flush'
2022-03-02 02:18:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1495:in 'flush_thread_run'
2022-03-02 02:18:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:499:in 'block (2 levels) in start'
2022-03-02 02:18:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.5/lib/fluent/plugin_helper/thread.rb:78:in 'block in thread_create
After this error occurs fluentd keeps trying to connect to ES but no more logs are send to ES
2022-03-02 02:21:32 +0000 [warn]: #0 [out_es] failed to flush the buffer. retry_times=10 next_retry_time=2022-03-02 02:22:02 +0000 chunk="5d932e5a693fa1544cc32d138971f5f8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster (logs-cluster-test): connect_write timeout reached"
2022-03-02 02:21:32 +0000 [warn]: #0 suppressed same stacktrace
Eventually the buffer reach it's limit and fluientd start logging errors of type BufferOverflowError. Metrics don't show anything noticeable, there are no spikes in communication between fluentd and ES before the error. After the error there in no output traffic and the buffer starts to get filled. There is no abnormal usage of cpu by the pods, neither before nor after the error occurs.
Hello, we have the same issue.
Fluentd version - v1.14.6-alpine-4
Buffer just stucks after a few minutes after the start of the fluentd pod.
Problem
Fluentd is working fine for hours and then it gets one of the two, either the buffer total_limit_size get reached and fleuntd stops working (even after I have set overflow_action to drop_oldest_chunk) or the queued_chunks_limit_size gets reached and again fleuntd stops sending. I have tried a lot of different configuration including default ones and using memory as a buffer. In any of the cases I am facing one of the two issues. Using the above configuration (this is my latest test) I got queued_chunk_limit_size reached (over 8200 file with the .meta files). The only errors i see in logs are for slow_flush_threshold from time to time. During the failure I am not observing extensive memory or CPU usage in either side Fleuntd nor Elasticsearch. Seems like the connection is just lost and never regained.
Steps to replicate
Input config on the forwarders:
Output config on aggregator:
Expected Behavior or What you need to ask
Constantly running fleuntd and sending logs to elasticsearch without interruptions
Using Fluentd and ES plugin versions
OS version:
Bare Metal or within Docker or Kubernetes or others?:
Kubernetes
Fluentd v0.12 or v0.14/v1.0: fluentd-0:/opt/bitnami/fluentd$ fluentd --version fluentd 1.13.0
ES plugin 3.x.y/2.x.y or 1.x.y