uken / fluent-plugin-elasticsearch

Apache License 2.0
891 stars 310 forks source link

Logs stop showing up in ES after SSL wrong version number (OpenSSL::SSL::SSLError) errors appears in fluentd log #927

Open canidam opened 2 years ago

canidam commented 2 years ago

(check apply)

Problem

I've fluentd running on K8s (EKS) sending data to Elasticsearch behind nginx with SSL (on a different machine and VPC if that matters). Everything runs fine for 24-48hrs, and then I get multiple error: wrong version number (OpenSSL::SSL::SSLError):

2021-10-29 21:04:46 +0000 [warn]: #0 failed to flush the buffer. retry_time=10 next_retry_seconds=2021-10-29 21:13:27 +0000 chunk="5cf840e532b7e1079f602bbd5dfc5f20" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure 
error="could not push logs to Elasticsearch cluster ({:host=>\"my.es.internal.dns\", :port=>443, :scheme=>\"https\"}):
 SSL_connect returned=1 errno=0 state=error: wrong version number (OpenSSL::SSL::SSLError)"

When the error happens, I stop seeing logs on the Elasticsearch, until I restart the container. Restarting the container solves the issue for another 24-48hrs.

I've tried to config min/max TLS versions or a specific version of TLS to use (on both fluentd plugin + nginx) on TLSv1.2 and TLSv1.3. The issue persists. My fluent config looks as follows

<label @OUTPUT>                                                                                                                                                                                                                                                              
 <match **>
   @type copy

   <store>
     @type prometheus
     <metric>
      ...
     </metric>
   </store>

   <store>
     @type elasticsearch
     @log_level debug
     host "#{ENV["FLUENT_ELASTICSEARCH_HOST"]}"
     port "#{ENV["FLUENT_ELASTICSEARCH_PORT"]}"
     scheme https
     ssl_verify false
     ssl_version TLSv1_3

     suppress_type_name true
     logstash_format true
     request_timeout 10                                                                                                                                                                                                                                                              

     time_key time                                                                                                                                                                                                                                                                   
     <buffer tag, es_index, time>                                                                                                                                                                                                                                                    
       @type file                                                                                                                                                                                                                                                                    
       timekey 1m                                                                                                                                                                                                                                                                    
       timekey_wait 10s                                                                                                                                                                                                                                                              

       path /fluentd/log/elastic-buffer.log                                                                                                                                                                                                                                          
       total_limit_size 32GB                                                                                                                                                                                                                                                         
       flush_thread_count 4                                                                                                                                                                                                                                                          
     </buffer> 

   </store>

 </match>
</label>

Steps to replicate

I don't have reproduction steps. Issue occurs somewhere between 24-48hrs for 2 weeks now, so it's consistent.

Expected Behavior or What you need to ask

I find it difficult to explain how come SSL errors shows up after such a long period of 2 days. Is it a config issue? a bug? How can I avoid this problem?

Using Fluentd and ES plugin versions

fluentd chart v0.2.8 fluentd version v1.12.0 ruby="2.6.6" Runs on EKS v1.20 'fluent-plugin-elasticsearch' version '4.3.3' nginx/1.16.0