tagomoris / fluent-plugin-secure-forward

Other
140 stars 30 forks source link

Problem with dead connection #16

Open doul59 opened 9 years ago

doul59 commented 9 years ago

Hi,

I have a problem with the plugin :

Forwarder :

2015-04-07 17:37:25 +0200 [info]: starting fluentd-0.12.7
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-mongo' version '0.7.8'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-parser' version '0.4.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-record-reformer' version '0.5.0'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.4.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-s3' version '0.5.7'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-secure-forward' version '0.2.6'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-td' version '0.10.26'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.0'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-webhdfs' version '0.4.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluentd' version '0.12.7'
2015-04-07 17:37:26 +0200 [info]: using configuration file: <ROOT>
  <source>
    type tail
    path /var/log/syslog
    pos_file /var/log/td-agent/syslog.pos
    format none
    tag syslog
  </source>
  <match reformered.**>
    type secure_forward
    self_hostname ${hostname}
    shared_key hogeposxxx0
    keepalive 300
    buffer_type memory
    buffer_chunk_limit 256m
    buffer_queue_limit 128
    flush_interval 5s
    disable_retry_limit true
    retry_wait 30s
    <server>
      host aggregator.toto.lan
      port 282525
    </server>
  </match>
  <match **>
    type record_reformer
    hostname ${hostname}
    tag reformered.${tag}
  </match>
</ROOT>
2015-04-07 17:37:26 +0200 [info]: adding match pattern="reformered.**" type="secure_forward"
2015-04-07 17:37:11 +0200 [info]: adding match pattern="**" type="record_reformer"
2015-04-07 17:37:12 +0200 [info]: adding source type="tail"
2015-04-07 17:37:12 +0200 [info]: following tail of /var/log/syslog
2015-04-07 17:37:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:37:18 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:37:50 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in `write_objects'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:in `write'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in `write_chunk'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:in `pop'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in `try_flush'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:in `run
2015-04-07 17:37:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:37:47 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:37:51 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:38:53 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"
  2015-04-07 17:37:51 +0200 [warn]: suppressed same stacktrace
2015-04-07 17:38:02 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:47 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:54 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:40:40 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"
  2015-04-07 17:38:54 +0200 [warn]: suppressed same stacktrace
2015-04-07 17:39:02 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:39:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:39:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:39:47 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:02 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:41 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:44:55 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"

Aggregator :

2015-04-07 17:33:11 +0200 [info]: starting fluentd-0.12.7
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-mixin-rewrite-tag-name' version '0.1.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-elasticsearch' version '0.7.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-mongo' version '0.7.8'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-mysqlslowquerylog' version '0.0.2'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-parser' version '0.4.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-record-reformer' version '0.5.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.4.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-s3' version '0.5.7'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-secure-forward' version '0.2.6'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-td' version '0.10.26'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-webhdfs' version '0.4.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluentd' version '0.12.7'
2015-04-07 17:33:11 +0200 [info]: using configuration file: <ROOT>
  <source>
    type secure_forward
    self_hostname ${hostname}
    shared_key hogeposxxx0
    cert_auto_generate yes
    port 282525
  </source>
  <match fluent.**>
    type record_reformer
    hostname ${hostname}
    tag reformered.${tag}
  </match>
  <match reformered.**>
    type record_reformer
    tag ${tag_suffix[1]}
  </match>
  <match **>
    type roundrobin
    <store>
      type elasticsearch
      host x.x.x.x
      port 9200
      logstash_format true
      logstash_prefix ${tag}
      include_tag_key true
      tag_key tag
      flush_interval 1s
    </store>
    <store>
      type elasticsearch
      host x.x.x.x
      port 9200
      logstash_format true
      logstash_prefix ${tag}
      include_tag_key true
      tag_key tag
      flush_interval 1s
    </store>
  </match>
</ROOT>

Can you help me ?

tagomoris commented 9 years ago

@doul59 Which td-agent version are you using? Does this problem occurs always (now and in past)? Or from any point of time?

repeatedly commented 9 years ago

From your log, connection established to xxx is missing. If plugin works correctly, above message should be logged: https://github.com/tagomoris/fluent-plugin-secure-forward/blob/e48684ac5ce801e299289b61652b39191dad8d9b/lib/fluent/plugin/output_node.rb#L204

Can your forwarder connect to aggregator? Does this happen with only secure-forward plugin?

doul59 commented 9 years ago

I have try 2.1.4 and upgrade this afternoon in 2.2.0 if i try without secure it's ok ...

tagomoris commented 9 years ago

@doul59 do you mean that everything works on td-agent 2.1.4?

doul59 commented 9 years ago

no i have same problem with 2.1.4

tagomoris commented 9 years ago

Do you configure port 282525? Under IP protocol, max number of available port is 65534 ...

doul59 commented 9 years ago

yes it's a mistake on my copy/paste :-1: i use 28252 but if i don't specify port in config it's same result

doul59 commented 9 years ago

on aggregator : netstat -anope

tcp 517 0 ip_aggreagator:28252 ip_forwarder:58854 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58829 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58831 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58851 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58841 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58855 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58850 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58852 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggreagator:28252 ip_forwarder:58835 ESTABLISHED 0 0 - off (0.00/0/0)

repeatedly commented 9 years ago

no i have same problem with 2.1.4

It means 2.1.4 doesn't have this issue? Could you see the differences between 2.1.4 and 2.2.0, like network state or something?

doul59 commented 9 years ago

i had exactly same issue yesterday with 2.1.4, i update in 2.2.0 but nothing change ... maybe version of ruby is a problem ?

dpkg -l | grep ruby ii ruby 1:1.9.3 all Interpreter of object-oriented scripting language Ruby (default version)

2015-04-08 09:07:25 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-08 09:07:52 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:35f3a4dc7ecc" 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in write_objects' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:inwrite' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in write_chunk' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:inpop' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in try_flush' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:inrun'

repeatedly commented 9 years ago

Hmm.. secure-forward works with ruby 2.1.0 or later on some environment so I hope ruby version is not related.

doul59 commented 9 years ago

i try with an other server // same configuration Without secure-forward : it's ok With secure-forward same issue

repeatedly commented 9 years ago

'Without secure-forward' means you use normal forward plugin? You seem to use Ubuntu so could you check secure-forward plugin works or not with td-agent 1?

http://docs.fluentd.org/articles/install-by-deb-v1

doul59 commented 9 years ago

my servers is on debian wheezy

doul59 commented 9 years ago

i have done a new test : Stop td-agent on forwarder + Stop td-agent on aggregator Start agent on aggregator wait 2 min Start agent on forwarder

forwarder send to aggregator few lines of log and after X minutes forwarder is broken

2015-04-08 17:20:29 +0200 [info]: following tail of /var/log/syslog 2015-04-08 17:20:30 +0200 [info]: connection established to aggregator.toto.lan 2015-04-08 17:21:31 +0200 [warn]: disconnected from aggregator.toto.lan 2015-04-08 17:21:34 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting... 2015-04-08 17:21:39 +0200 [warn]: recovered connection to dead node: aggregator.toto.lan 2015-04-08 17:22:32 +0200 [warn]: disconnected from aggregator.toto.lan 2015-04-08 17:22:34 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting... 2015-04-08 17:22:49 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting... 2015-04-08 17:23:04 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting... 2015-04-08 17:23:19 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting... 2015-04-08 17:23:32 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-08 17:23:32 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:35357e008488" 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in write_objects' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:inwrite' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in write_chunk' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:inpop' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in try_flush' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:inrun' 2015-04-08 17:23:33 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-08 17:23:34 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:35357e008488"

on the aggregator :

tcp 517 0 ip_aggregator:28252 ip_forwarder:60614 ESTABLISHED 0 0 - off (0.00/0/0) tcp 38 0 ip_aggregator:28252 ip_forwarder:60613 CLOSE_WAIT 110 28593 6067/ruby off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60622 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60626 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60630 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60628 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60627 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60635 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60617 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60629 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60633 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60620 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60634 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60631 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60632 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60619 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60615 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60625 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60621 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60618 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60616 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60624 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60623 ESTABLISHED 0 0 - off (0.00/0/0) tcp 517 0 ip_aggregator:28252 ip_forwarder:60636 ESTABLISHED 0 0 - off (0.00/0/0)

doul59 commented 9 years ago

in debug mode :

aggregator : 2015-04-08 17:37:09 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="fluent." type="record_reformer" 2015-04-08 17:37:09 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered output plugin 'record_reformer' 2015-04-08 17:37:09 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="reformered." type="record_reformer" 2015-04-08 17:37:09 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="**" type="roundrobin" 2015-04-08 17:37:09 +0200 [debug]: plugin/out_roundrobin.rb:44:block in configure: adding store type="elasticsearch", weight=1 2015-04-08 17:37:09 +0200 [debug]: plugin/out_roundrobin.rb:44:block in configure: adding store type="elasticsearch", weight=1 2015-04-08 17:37:09 +0200 [info]: fluent/root_agent.rb:142:add_source: adding source type="secure_forward" 2015-04-08 17:37:09 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered input plugin 'secure_forward' 2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:179:run: setup for ssl sessions 2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:185:run: start to listen bind="0.0.0.0" port=28252 2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:187:run: starting SSL server bind="0.0.0.0" port=28252 2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:191:run: accepting sessions 2015-04-08 17:38:14 +0200 [trace]: plugin/in_secure_forward.rb:194:block in run: accept tcp connection (ssl session not established yet) 2015-04-08 17:38:14 +0200 [debug]: plugin/input_session.rb:148:start: starting server 2015-04-08 17:38:14 +0200 [trace]: plugin/in_secure_forward.rb:199:block in run: session instances: all=1 closed=0 2015-04-08 17:38:14 +0200 [trace]: plugin/input_session.rb:150:start: accepting ssl session 2015-04-08 17:38:14 +0200 [debug]: plugin/input_session.rb:63:generate_helo: generating helo 2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read 2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:69:check_ping: checking ping 2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:105:generate_pong: generating pong 2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:137:on_read: connection established 2015-04-08 17:39:15 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read 2015-04-08 17:39:16 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read 2015-04-08 17:39:19 +0200 [trace]: plugin/in_secure_forward.rb:194:block in run: accept tcp connection (ssl session not established yet) 2015-04-08 17:39:19 +0200 [debug]: plugin/input_session.rb:148:start: starting server 2015-04-08 17:39:19 +0200 [trace]: plugin/in_secure_forward.rb:199:block in run: session instances: all=1 closed=0 2015-04-08 17:39:19 +0200 [trace]: plugin/input_session.rb:150:start: accepting ssl session 2015-04-08 17:39:19 +0200 [debug]: plugin/input_session.rb:63:generate_helo: generating helo 2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read 2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:69:check_ping: checking ping 2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:105:generate_pong: generating pong 2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:137:on_read: connection established 2015-04-08 17:39:21 +0200 [info]: plugin/out_elasticsearch.rb:67:client: Connection opened to Elasticsearch cluster => {:host=>"ip-es1", :port=>9200, :scheme=>"http"} 2015-04-08 17:39:21 +0200 [info]: plugin/out_elasticsearch.rb:67:client: Connection opened to Elasticsearch cluster => {:host=>"ip-es2", :port=>9200, :scheme=>"http"}

forwarder : 2015-04-08 17:38:14 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="reformered." type="secure_forward" 2015-04-08 17:38:14 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered output plugin 'secure_forward' 2015-04-08 17:38:14 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="" type="record_reformer" 2015-04-08 17:38:14 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered output plugin 'record_reformer' 2015-04-08 17:38:14 +0200 [info]: fluent/root_agent.rb:142:add_source: adding source type="tail" 2015-04-08 17:38:14 +0200 [debug]: plugin/out_secure_forward.rb:121:start: starting secure-forward 2015-04-08 17:38:14 +0200 [debug]: plugin/out_secure_forward.rb:123:start: start to connect target nodes 2015-04-08 17:38:14 +0200 [debug]: plugin/out_secure_forward.rb:125:block in start: connecting node host="aggregator.toto.lan" port=28252 2015-04-08 17:38:14 +0200 [info]: plugin/in_tail.rb:477:initialize: following tail of /var/log/syslog 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:212:connect: starting client 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:215:connect: create tcp socket to node host="aggregator.toto.lan" address="192.168.18.81" port=28252 2015-04-08 17:38:14 +0200 [trace]: plugin/output_node.rb:224:connect: changing socket options 2015-04-08 17:38:14 +0200 [trace]: plugin/output_node.rb:232:connect: initializing SSL contexts 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:237:connect: trying to connect ssl session host="aggregator.toto.lan" address="192.168.18.81" port=28252 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:249:connect: ssl session connected host="aggregator.toto.lan" port=28252 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:269:connect: ssl sessison connected host="aggregator.toto.lan" port=28252 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:129:check_helo: checking helo 2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:141:generate_ping: generating ping 2015-04-08 17:38:15 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read 2015-04-08 17:38:15 +0200 [debug]: plugin/output_node.rb:156:check_pong: checking pong 2015-04-08 17:38:15 +0200 [info]: plugin/output_node.rb:204:on_read: connection established to aggregator.toto.lan 2015-04-08 17:38:15 +0200 [debug]: plugin/output_node.rb:207:on_read: connection established host="aggregator.toto.lan" port=28252 expire=nil 2015-04-08 17:38:19 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:19 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:24 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:24 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:29 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:29 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:34 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:34 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:39 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:39 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:44 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:44 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:49 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:49 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:54 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:54 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:38:59 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:38:59 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:39:04 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:39:04 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:39:09 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:39:09 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:39:14 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:39:14 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:39:15 +0200 [trace]: plugin/out_secure_forward.rb:218:write_objects: selected node host="aggregator.toto.lan" port=28252 standby=false 2015-04-08 17:39:16 +0200 [trace]: plugin/out_secure_forward.rb:218:write_objects: selected node host="aggregator.toto.lan" port=28252 standby=false 2015-04-08 17:39:16 +0200 [warn]: plugin/output_node.rb:294:rescue in block in connect: disconnected from aggregator.toto.lan 2015-04-08 17:39:16 +0200 [debug]: plugin/output_node.rb:89:shutdown: shutting down node aggregator.toto.lan 2015-04-08 17:39:17 +0200 [warn]: fluent/output.rb:354:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2015-04-08 17:39:18 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:335bbe846424" 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in write_objects' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:inwrite' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in write_chunk' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:inpop' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in try_flush' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:inrun' 2015-04-08 17:39:19 +0200 [warn]: fluent/output.rb:354:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2015-04-08 17:39:19 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:335bbe846424" 2015-04-08 17:39:19 +0200 [warn]: suppressed same stacktrace 2015-04-08 17:39:19 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher 2015-04-08 17:39:19 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan 2015-04-08 17:39:19 +0200 [warn]: plugin/out_secure_forward.rb:151:block (2 levels) in node_watcher: dead connection found: aggregator.toto.lan, reconnecting... 2015-04-08 17:39:19 +0200 [debug]: plugin/out_secure_forward.rb:156:block (2 levels) in node_watcher: reconnecting to node host="aggregator.toto.lan" port=28252 expire=nil expired=false detached=false 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:212:connect: starting client 2015-04-08 17:39:19 +0200 [trace]: plugin/out_secure_forward.rb:168:block (2 levels) in node_watcher: checking reconnecting node aggregator.toto.lan 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:215:connect: create tcp socket to node host="aggregator.toto.lan" address="192.168.18.81" port=28252 2015-04-08 17:39:19 +0200 [trace]: plugin/output_node.rb:224:connect: changing socket options 2015-04-08 17:39:19 +0200 [trace]: plugin/output_node.rb:232:connect: initializing SSL contexts 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:237:connect: trying to connect ssl session host="aggregator.toto.lan" address="192.168.18.81" port=28252 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:249:connect: ssl session connected host="aggregator.toto.lan" port=28252 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:269:connect: ssl sessison connected host="aggregator.toto.lan" port=28252 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:129:check_helo: checking helo 2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:141:generate_ping: generating ping 2015-04-08 17:39:20 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read 2015-04-08 17:39:20 +0200 [debug]: plugin/output_node.rb:156:check_pong: checking pong 2015-04-08 17:39:20 +0200 [debug]: plugin/output_node.rb:207:on_read: connection established host="aggregator.toto.lan" port=28252 expire=nil 2015-04-08 17:39:20 +0200 [warn]: fluent/output.rb:354:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2015-04-08 17:39:24 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:335bbe846424"

tagomoris commented 9 years ago

How about to configure forwarder without keepalive 300? (it means keep connection alive forever.)

repeatedly commented 9 years ago

Hmm... I tried Debian 7 with td-agent 2.2.0 but there is no problem in localhost test.

tcp        0      0 127.0.0.1:28252         127.0.0.1:50370         ESTABLISHED 0          27790       -                off (0.00/0/0)
tcp        0      0 127.0.0.1:50370         127.0.0.1:28252         ESTABLISHED 0          27789       -                off (0.00/0/0)

From your netstat, there are lots of 517 Recv-Q and 0 inode connections. This is weird...

tcp 38 0 ip_aggregator:28252 ip_forwarder:60613 CLOSE_WAIT 110 28593 6067/ruby off (0.00/0/0)

What is this program? 6067/ruby is your test program?

doul59 commented 9 years ago

without keepalive : same issue

in this exemple : 6067/ruby is td-agent process

doul59 commented 9 years ago

i had reinstall all plateform // i have same issue without secure-forward :/

tagomoris commented 9 years ago

Same issue without secure-forward? It's weird, and it seems not about secure-forward, but about network quality of your system or so...

doul59 commented 9 years ago

(i used fluent 2.1.4 since 2 months without issue) i had upgrade in 2.2.0 and some issue :/

i try reinstall in 2.1.4 ;)

doul59 commented 9 years ago

same result with 2.1.4 i don't understand ... i have try with new VM and reinstall OS + fluentd and i have always this issue forwarder can't stay connected with aggregator (not a network problem ... same network tcpdump seems ok)

jo-bi commented 9 years ago

I have the same issue. How do we fix this or is there an work around?

tagomoris commented 8 years ago

I released experimental version of this plugin, v0.3.5pre1. It may improve your problems if you're specifying keepalive timeout option. With this version, secure-forward output plugin will disconnect stuck connections forcedly after keepalive timeout (in default, connection_hard_timeout is 1.2x of keepalive).

If you have troubles on stuck connections with keepalive, please try it, with just few nodes at first :P

alexilievski commented 7 years ago

Hi

I get a similar issue to this. A connection fails to be established even though the logs say the retry succeeded nothing makes it to the aggregator. Any insight as to what could be causing this issue?

2017-07-19 11:10:45 +0000 [info]: connection established to logger.contacta.io 2017-07-19 11:11:45 +0000 [warn]: disconnected host="logger.contacta.io" port=24284 2017-07-19 11:11:45 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:11:46 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.4.5/lib/fluent/plugin/out_secure_forward.rb:259:in write_objects' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/output.rb:490:inwrite' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/buffer.rb:354:in write_chunk' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/buffer.rb:333:inpop' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/output.rb:342:in try_flush' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/output.rb:149:inrun' 2017-07-19 11:11:46 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:11:49 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:11:46 +0000 [warn]: suppressed same stacktrace 2017-07-19 11:11:49 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:11:52 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:11:49 +0000 [warn]: suppressed same stacktrace 2017-07-19 11:11:49 +0000 [warn]: dead connection found: logger.contacta.io, reconnecting... 2017-07-19 11:11:52 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:01 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:11:52 +0000 [warn]: suppressed same stacktrace 2017-07-19 11:11:54 +0000 [warn]: recovered connection to dead node: logger.contacta.io 2017-07-19 11:12:01 +0000 [warn]: retry succeeded. plugin_id="object:3faae7565348" 2017-07-19 11:12:02 +0000 [warn]: disconnected host="logger.contacta.io" port=24284 2017-07-19 11:12:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:03 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:12:02 +0000 [warn]: suppressed same stacktrace 2017-07-19 11:12:03 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:05 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:12:03 +0000 [warn]: suppressed same stacktrace 2017-07-19 11:12:04 +0000 [warn]: dead connection found: logger.contacta.io, reconnecting... 2017-07-19 11:12:05 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:09 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:12:05 +0000 [warn]: suppressed same stacktrace 2017-07-19 11:12:09 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:17 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348" 2017-07-19 11:12:09 +0000 [warn]: suppressed same stacktrace