Open willemveerman opened 4 years ago
We know that it is sending duplicate requests because if we comment out use_record_as_seed (as below) the records are not hashed but instead a uuid is inserted in the _id field
We can then see in Kibana hundreds of k8s audit log records with identical auditID field.
Is it possible that a filter with ** expression can cause fluentd to emit duplicate records?
It is impossible to implement under at least once protocol on Elasticsearch REST API. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html Elasticsearch itself does not return its record IDs from bulk API responses.
This is Fluentd mechanism limitation and Elasticsearch REST API limitation. If bulk API does not handle unique IDs, Elasticsearch shouldn't reject duplicated records.
So what you're saying is, if one record in the bulk request is rejected, fluentd will re-send the entire bulk request?
This is Fluentd mechanism limitation and Elasticsearch REST API limitation. If bulk API does not handle unique IDs, Elasticsearch shouldn't reject duplicated records.
I realise that if records are given a different unique ID in the _id
field they will treated as new records in elasticsearch - that's fine, that's expected behaviour
What's not expected though is for fluentd to repeatedly send the same record
Why is it doing that?
Coud it be because we have filter with <filter **>
setting?
So what you're saying is, if one record in the bulk request is rejected, fluentd will re-send the entire bulk request?
Yes.
So what you're saying is, if one record in the bulk request is rejected, fluentd will re-send the entire bulk request?
Yes.
Hold on, doesn't that mean that fluentd can enter a never-ending loop?
Elasticsearch plugin can resign to send records with https://github.com/uken/fluent-plugin-elasticsearch#unrecoverable-error-types. If Elasticsearch responses have error type, it can handle on error handler with unrecoverable_error_types.
If ES plugin does not catch up rejected duplicated records error as unrecoverable error, Fluentd re-send entire records. Yeah, your understanding is correct.
OK, thank you for the help.
However, es_rejected_execution_exception
is included in unrecoverable_error_types
by default. Correct?
Therefore, there is no configuration change in fluent-plugin-elasticsearch which I can make which will fix this problem. Correct?
Yes, it's the default: https://github.com/uken/fluent-plugin-elasticsearch/blob/master/lib/fluent/plugin/out_elasticsearch.rb#L153
So surely we want to keep es_rejected_execution_exception
in unrecoverable_error_types
so plugin does not re-send records on es_rejected_execution_exception
error
However, es_rejected_execution_exception is included in unrecoverable_error_types by default. Correct?
Correct.
Therefore, there is no configuration change in fluent-plugin-elasticsearch which I can make which will fix this problem. Correct?
Correct.
What about if we put in retry_tag es.retry
then insert:
<match retry.es.*>
@type stdout
</match>
at top of config file
Yes. Re-routing is one approach to prevent re-register events and cause ES cluster exhausion. Openshift's log system uses this approach.
We have inserted this however we are still seeing issues of huge bulk write queue on one ES data node (out of 10 data nodes in cluster), which we believe is caused by fluentd re-sending records still
Have you seen such a situation before?
We does not provide Data manupilation services nor run huge Elasticsearch cluster. So, I hadn't heard such huge transportation rate environment. Our main target is fluentd-kubernetes-daemonset data ingestion with fluentd-debian-elasticsearch image.
Yes we're using that daemonset and image
We may have spotted some other issues so will raise another ticket if we can find a reproducible issue
Oh! You are so "advanced" user! 😎
(check apply)
Problem
We have found that k8s API server audit log records are being duplicated many times over
We are using the
elasticsearch_genid
filter to hash each record so duplicate records do not appear in ESHowever, we have been suffering severe downtime in our ES cluster because fluentd is repeatedly re-sending requests
We know that it is sending duplicate requests because if we comment out
use_record_as_seed
(as below) the records are not hashed but instead a uuid is inserted in the_id
fieldWe can then see in Kibana hundreds of k8s audit log records with identical
auditID
field.Is it possible that a filter with
**
expression can cause fluentd to emit duplicate records?Steps to replicate
as above
Expected Behavior or What you need to ask
No duplicate records in elasticsearch
Using Fluentd and ES plugin versions