Open kannanvr opened 3 years ago
How about using reload_on_failure
as true
?
Current your settings just continues on error. And nothing to remove dead node from node information.
reconnect_on_error true
reload_on_failure true
reload_connections false
And also could you read the following link? I guess that there is description what you want.
Thanks @cosmo0920 . We are testing it out these options. ALso, Just a curious to understand , why fluentd is trying to send a data to podIP of elasticsearch directly after some 10 to 12 Hours. Is there a option to set the fluentd for not to send the pod IP directly when svc ip is configured.?
Thanks, Kannan V
why fluentd is trying to send a data to podIP of elasticsearch directly after some 10 to 12 Hours.
This is not Fluentd Core functionality. It comes from the dependency of ES plugin which is elasticsearch gem. This gem manages elasticsearch node as IP list as usual.
Is there a option to set the fluentd for not to send the pod IP directly when svc ip is configured.?
Elasticsearch plugin doesn't touch this mechanism which comes from elasticsearch gem. It depends on your Elasticsearch cluster settings and elasticsearch gem functionality.
@cosmo0920 , If there is a elasticsearch settings , please let us know what parameter we need to change it? Or if it is elasticsearch gem functionality, where to raise this issue ?
This feature is really intelligent to send the data. But in our environment , we are sending data to Incluster ES cluster and some times Remote cluster ES Cluster. In both the approach fluentd is trying to send an ES pod IP after some time.
Remote ES cluster is running as a pod. When fluentd is trying to send the data to Remote ES cluster, it is trying to send to podIP of remote cluster which is not reachable from remote cluster. We have changed the settings as you have mentioned above. But it is still trying to send to podIP which is not required for us. and it is not reloading also.
<match **>
@id elasticsearch
@type elasticsearch
@log_level info
include_tag_key true
host remote-es-cluster
port 80
logstash_prefix test
logstash_format true
suppress_type_name true
request_timeout 2000
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 8
flush_interval 8s
retry_forever true
retry_max_interval 5
chunk_limit_size 8M
queue_limit_length 10
reconnect_on_error true
reload_on_failure true
reload_connections false
</buffer>
</match>
Request to provide your valuable suggestion in our use case
Thanks, Kannan V
Or if it is elasticsearch gem functionality, where to raise this issue ?
And how about using sniffer_class_name
parameter?
https://github.com/uken/fluent-plugin-elasticsearch#sniffer-class-name
As usual, not k8s environment, ES plugin works well with sniffer functionality which uses _node
API:
https://github.com/elastic/elasticsearch-ruby/blob/8fd7b0868db8ee06ea33f363c66b2545d037d00e/elasticsearch-transport/lib/elasticsearch/transport/transport/sniffer.rb#L46-L68
But on k8s environment, this parameter and bundled Fluent::Plugin::ElasticsearchSimpleSniffer
class is useful to prevent node fetching information as each of Pod IPs:
https://github.com/uken/fluent-plugin-elasticsearch/blob/master/lib/fluent/plugin/elasticsearch_simple_sniffer.rb
Ok Thanks @cosmo0920 . We will try this parameter and inform you the progress after 12 Hrs. we are going to use Fluent::Plugin::ElasticsearchSimpleSniffer as a sniffer class
@kannanvr How's going your testing?
@cosmo0920 , we are still facing this issue. As of now we are restarting fluentd in every 3 hours once temporarily. We need to resolve this issue. Is there any other way to solve this?
As of now we are restarting fluentd in every 3 hours once temporarily. We need to resolve this issue. Is there any other way to solve this?
If SimpleSniffer does not solve this issue, there is no way to avoid this. Why don't you send your issue in https://github.com/elastic/elasticsearch-ruby? It is too hard to describe your issue in yourself?
No. I will raise the issue at efk-ruby project now.
No. I will raise the issue at efk-ruby project now.
Thanks! :muscle:
FYI: You should write down step to reproduce this issue on https://github.com/elastic/elasticsearch-ruby/issues/1353.
Only with the written information on https://github.com/elastic/elasticsearch-ruby/issues/1353, I can't reproduce the problem either.
@cosmo0920 , Updated with the detailed info on elasticsearch-ruby project. Thanks for your Help
We are facing this exact same issue, our elasticsearch is running on K8s and fluentd is outside talking via metallb, any known fix for this ? Does an older version not have this issue ?
Problem
We have deployed the fluentd with elasticsearch. Fluentd could not send Logs to elasticsearch if elasticsearch is restarted Following is the Logs
This is happening when Elastic serach pod is resterted after 12 hours of fluentd deployment. for the entire elastic search pod IP, we have configured the service IP. We have configured the service FQDN to send the fluentd data to elasticsearch. BUt it is trying to send via pod IP. We dont want fluentd send the data via pod IP directly, It should send via the service IP which we have configured. How to fix this issue ?
Steps to replicate
deploy the fluentd with elasticsearch as a pod. After 12 hours, restart Elasticserach. Fluentd could not send Logs to elasticsearch because it is still trying to send to POD IP rather than service IP
configuration
Expected Behavior or What you need to ask
Fluentd should send the data to elasticsearch service IP rather than pod IP
Using Fluentd and ES plugin versions
fluentd --version
ortd-agent --version
fluent-gem list
,td-agent-gem list
or your Gemfile.lock