y-ken / fluent-plugin-twitter

Fluentd Input/Output plugin to process tweets with Twitter Streaming API.
https://rubygems.org/gems/fluent-plugin-twitter
Other
53 stars 27 forks source link

Support proxy #44

Closed okkez closed 7 years ago

okkez commented 7 years ago

See https://github.com/sferik/twitter/blob/master/examples/Configuration.md#using-a-proxy

Fix #27, #38

y-ken commented 7 years ago

Thank you very much. I'm afraid could you update README please? Thank you.

y-ken commented 7 years ago

Thank you very much!

jerry924 commented 6 years ago

I am not having luck getting this to work.

I tried the following in my config:

<source>
  @type twitter
  consumer_key           SECRET    #YOUR_CONSUMER_KEY # Required
  consumer_secret       SECRET   #YOUR_CONSUMER_SECRET # Required
  access_token             SECRET     #YOUR_OAUTH_TOKEN # Required
  access_token_secret  SECRET #YOUR_OAUTH_TOKEN_SECRET # Required
  tag                 input.twitter  # Required
  keyword             baseball
  timeline            sampling        #userstream # Required (sampling or userstream)
  <proxy>
     host myproxyhostname
     port 80
  </proxy>
</source>

But I get this warning in my td-agent.log file:

2017-11-20 18:56:31 -0800 [warn]: section <proxy> is not used in <source> of twitter plugin

I also tried with the following four environment variables both set to the proper value and also clear. http_proxy, HTTP_PROXY, https_proxy, HTTPS_PROXY.

What ends up happening with environment variables cleared and without the proxy taking affect above is the request to twitter IP/port times out.

Here is a fragment of my log with all http_proxy env variables cleared and tag in the configuration file:

2017-11-20 18:56:31 -0800 [info]: #0 starting fluentd worker pid=17726 ppid=14103 worker=0
2017-11-20 18:56:31 -0800 [info]: #0 listening dRuby uri="druby://127.0.0.1:24230" object="Fluent::Engine"
2017-11-20 18:56:31 -0800 [info]: #0 twitter: starting Twitter Streaming API for sampling. tag:input.twitter keyword:baseball
2017-11-20 18:56:31 -0800 [info]: #0 listening port port=24224 bind="0.0.0.0"
2017-11-20 18:56:31 -0800 [info]: #0 fluentd worker is now running worker=0
2017-11-20 18:58:38 -0800 [warn]: #0 thread exited by unexpected error plugin=Fluent::Plugin::TwitterInput title=:in_twitter error_class=Errno::ETIMEDOUT error="Connection timed out - connect(2) for \"199.59.148.229\" port 443"
2017-11-20 18:58:38 -0800 [error]: #0 unexpected error error_class=Errno::ETIMEDOUT error="Connection timed out - connect(2) for \"199.59.148.229\" port 443"
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/twitter-6.1.0/lib/twitter/streaming/connection.rb:16:in `initialize'
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/twitter-6.1.0/lib/twitter/streaming/connection.rb:16:in `new'
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/twitter-6.1.0/lib/twitter/streaming/connection.rb:16:in `stream'
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/twitter-6.1.0/lib/twitter/streaming/client.rb:119:in `request'
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/twitter-6.1.0/lib/twitter/streaming/client.rb:37:in `filter'
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-twitter-0.6.1/lib/fluent/plugin/in_twitter.rb:67:in `run'
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-twitter-0.6.1/lib/fluent/plugin/in_twitter.rb:49:in `block in start'
  2017-11-20 18:58:38 -0800 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.23/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2017-11-20 18:58:38 -0800 [error]: #0 unexpected error error_class=Errno::ETIMEDOUT error="Connection timed out - connect(2) for \"199.59.148.229\" port 443"
  2017-11-20 18:58:38 -0800 [error]: #0 suppressed same stacktrace
2017-11-20 18:58:38 -0800 [info]: Worker 0 finished unexpectedly with status 1
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-kafka' version '0.6.1'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-mongo' version '0.8.1'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.6'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-s3' version '0.8.5'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-td' version '0.10.29'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.3'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-twitter' version '0.6.1'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-twitter' version '0.6.0'
2017-11-20 18:58:39 -0800 [info]: gem 'fluent-plugin-webhdfs' version '0.7.1'
2017-11-20 18:58:39 -0800 [info]: gem 'fluentd' version '0.14.23'
2017-11-20 18:58:39 -0800 [info]: gem 'fluentd' version '0.12.40'
2017-11-20 18:58:39 -0800 [info]: adding match pattern="td.*.*" type="tdlog"
2017-11-20 18:58:39 -0800 [warn]: #0 secondary type should be same with primary one primary="Fluent::TreasureDataLogOutput" secondary="Fluent::Plugin::FileOutput"
2017-11-20 18:58:39 -0800 [info]: adding match pattern="debug.**" type="stdout"
2017-11-20 18:58:39 -0800 [info]: adding match pattern="input.twitter.sampling" type="stdout"
2017-11-20 18:58:39 -0800 [info]: adding source type="forward"
2017-11-20 18:58:39 -0800 [info]: adding source type="http"
2017-11-20 18:58:39 -0800 [info]: adding source type="debug_agent"
2017-11-20 18:58:39 -0800 [info]: adding source type="twitter"
2017-11-20 18:58:39 -0800 [warn]: section <proxy> is not used in <source> of twitter plugin
2017-11-20 18:58:39 -0800 [warn]: section <proxy> is not used in <source> of twitter plugin
y-ken commented 6 years ago

@okkez I'm afraid could you check this behavior? jerry924 using following environment.

okkez commented 6 years ago

fluent-plugin-twitter 0.6.1 does not support <proxy> section. Not released yet including <proxy> support.

jerry924 commented 6 years ago

I see... since it was merged in April, I thought it would be in latest release. I will try to manually patch my files with the necessary changes then..

Since the docs do mention this, might either want to rev the product or comment in the docs that the version for proxy is not released yet.

Thank You. And thank you for responding so quickly. I've been banging my head on this one for hours.

jerry924 commented 6 years ago

So my not used message went away after patching the changed files in this PR, but I still get timout connecting to Twitter both with and without setting http_proxy/https_proxy. Have not found any ideas via Google search, so may need to just abandon this.

okkez commented 6 years ago

I've written some sample code for testing twitter gem's proxy support. Twitter::REST::Client proxy support works well. But Twitter::Streaming::Client proxy support does not work properly.

REST API works:

  client = Twitter::REST::Client.new do |config|
    config.consumer_key = "xxx"
    config.consumer_secret = "xxx"
    config.access_token = "xxx"
    config.access_token_secret = "xxx"
    config.proxy = {
      uri: Addressable::URI.parse("http://localhost:8080")
    }
  end

  client.search("Ruby", result_type: "recent").take(3).each do |tweet|
    puts tweet.text
  end

Streaming API doesn't work:

client = Twitter::Streaming::Client.new do |config|
  config.consumer_key = "xxx"
  config.consumer_secret = "xxx"
  config.access_token = "xxx"
  config.access_token_secret = "xxx"
  config.proxy = {
    #uri: Addressable::URI.parse("http://localhost:8080")
    # host: "127.0.0.1",
    # port: 8080,
    proxy_address: "127.0.0.1",
    proxy_port: 8080,
  }
end

client.filter(track: "zipper") do |o|
  puts o.text
end

The reason:

twitter gem's HTTP::Request usage is wrong in streaming API: https://github.com/sferik/twitter/blob/v6.2.0/lib/twitter/streaming/client.rb#L113

HTTP::Request.new expects proxy_address like key, see

okkez commented 6 years ago

I got following error:

twitter/streaming/response.rb:24:in `on_headers_complete': Twitter::Error::NotFound
        from lib/twitter/streaming/response.rb:19:in `<<'
        from lib/twitter/streaming/response.rb:19:in `<<'
        from lib/twitter/streaming/connection.rb:25:in `block in stream'
        from lib/twitter/streaming/connection.rb:21:in `loop'
        from lib/twitter/streaming/connection.rb:21:in `stream'
        from lib/twitter/streaming/client.rb:123:in `request'
        from lib/twitter/streaming/client.rb:38:in `filter'
        from t.rb:21:in `<main>'
okkez commented 6 years ago

See also https://github.com/sferik/twitter/issues/639

okkez commented 6 years ago

twitter API support only TLS. but twitter gem does not support TLS connection with proxy. In fact, http gem does not support proxy with HTTPS URI. See here.

y-ken commented 6 years ago

Thank you for investigating the proxy behavior. 🙇 Are there no way to connect to twitter streaming api with proxy for now?

okkez commented 6 years ago

Are there no way to connect to twitter streaming api with proxy for now?

No, there isn't, for now. We need sending patch to http gem or twitter gem. Or we can replace twitter gem's http backend to other gem... (I'm not sure...)