tweetstream / em-twitter

Twitter Streaming API client for EventMachine
http://rubygems.org/gems/em-twitter
MIT License
42 stars 16 forks source link

Events happening twice with Site Streams #12

Closed philgyford closed 11 years ago

philgyford commented 11 years ago

When using Twitter Site Streams, I seem to be getting all events twice. I connect like this:

require 'em-twitter'

options = {
 :path => "/1.1/site.json",
 :params => {:follow => "2030131"},
 :host => "sitestream.twitter.com",
 :method => "POST",
 :on_inited => nil,
 :proxy => nil,
 :oauth => 
  {:consumer_key => mykey,
   :consumer_secret => mysecret,
   :token => mytoken,
   :token_secret => mytokensecret
   }
}

EM.run do
  client = EM::Twitter::Client.connect(options)

  client.each do |result|
    puts result
  end
end

And get results like this:

{"control":{"control_uri":"/1.1/site/c/1_277198_9f2c58a4474597af75cdfaeaaeea2331"}}
{"control":{"control_uri":"/1.1/site/c/1_277198_9f2c58a4474597af75cdfaeaaeea2331"}}
{"for_user":2030131,"message":{"friends":[807095,2097571,6594742,428333,759251,742143,19290942,814760,612473,7589572,23937508,89168924,69329527,12552,7111412,1652541,178311661]}}
{"for_user":2030131,"message":{"friends":[807095,2097571,6594742,428333,759251,742143,19290942,814760,612473,7589572,23937508,89168924,69329527,12552,7111412,1652541,178311661]}}

And the same for receiving messages etc.

Out of curiosity I tried replacing the path and params with these from the em-twitter example:

:path   => '/1/statuses/filter.json',
:params => { :track => 'yankees' },

and the same thing happened... until I removed this line:

:host => "sitestream.twitter.com",

Then I only received one event for each occurrence of 'yankees'.

It's possible this is some weird Twitter Site Streams thing, rather than em-twitter. But I'm not sure how to test any closer to Twitter, and am struggling to narrow it down any further. Any suggestions for isolating this more?

philgyford commented 11 years ago

I've got it... EventMachine::Twitter::Connection.on_body() starts like this:

def on_body(data)
  begin
    @buffer.extract(data).each do |line|
      handle_stream(line)
    end
    ...

And data looks something like this:

"{\"control\":{\"control_uri\":\"/1.1/site/c/1_224090_e1c4219ef18ffbe610b8267ce33ab3f1\"}}\r\n\r\n"

Extra linebreaks on the end. So handle_stream(line) gets called twice, and the second time line is "\n". So I guess handle_stream() is outputting the same response on both occasions. But it should only be called once.

I'm not sure how best to fix this. Is doing data.strip! in on_body() a stupid idea? It seems too simple and easy :)

stve commented 11 years ago

Thanks for tracking this down!

I haven't had reports of this so it appears to be sitestream specific. That makes some sense as few people have sitestream credentials so it probably hasn't tripped many people up.

This should be an easy fix, we just need to strip off the newlines before handing off the response to handle_stream. Will hopefully have a fix out soon.

philgyford commented 11 years ago

Great! Once again, let me know if there's any way I can help try things, given we have Site Stream access.

stve commented 11 years ago

I just pushed a fix, if possible, could you verify it works by pointing to master. If all good, I'll get a release out ASAP.

philgyford commented 11 years ago

Yes, that seems to work, thanks!

One issue for me - tweetstream currently specifies em-twitter '~> 0.2'. Will that change to 0.3 soon or are there reasons not to?

stve commented 11 years ago

I'm going to be releasing a new tweetstream gem in tandem with em-twitter so you won't be caught in dependency hell. :smile:

philgyford commented 11 years ago

That'll be great, thanks :) Any idea of the timeframe? Days or weeks? Not trying to hassle you at all, just so that I have a vague idea.

sferik commented 11 years ago

@philgyford If you want to use the latest code in your project, you don’t need to wait for a gem release. You can just add the following line to your Gemfile:

gem 'em-twitter', :git => 'https://github.com/tweetstream/em-twitter.git'
philgyford commented 11 years ago

But tweetstream will still use em-twitter 0.2.x won't it? (I may have misunderstood how this works.)

sferik commented 11 years ago

The latest gem release of tweetstream (currently version 2.5.0) specifies the following dependency on em-twitter: ~> 0.2.

This this is the same as saying any version greater than or equal to 0.2 but less than 1. This includes 0.3, 0.3.1, 0.4, 0.10, and 0.99.999. As long as the first digit is a zero and the second digit is greater than or equal to 2, the constraint will be met.

When you see the pessimistic version constraint (~>), replace the last digit specified with an x, where x is greater than or equal to that number. So, for example:

Give it a try and let me know if it’s not working the way you expected. You can confirm that it’s working as expected by inspecting your Gemfile.lock.

philgyford commented 11 years ago

Ah, I think I misunderstood the syntax. Thanks; I think it's working for me now!