ptwobrussell / Mining-the-Social-Web

The official online compendium for Mining the Social Web (O'Reilly, 2011)
http://bit.ly/135dHfs
Other
1.21k stars 491 forks source link

Unable to fetch more than 75,000 ids #56

Closed mh-github closed 11 years ago

mh-github commented 11 years ago

Hi,

Using friends_followers__friend_follower_symmetry.py, I am unable to fetch more than 75,000 follower ids.

The error message is twitter.api.TwitterHTTPError: Twitter sent status 404 for URL: 1.1/account/rate_limit_status.json using parameters: (oauth_consumer_key=XXXX&oauth_nonce=XXXX&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1369338285&oauth_token=XXXX&oauth_version=1.0&oauth_signature=XXXX) details: {"errors":[{"message":"Sorry, that page does not exist","code":34}]}

Looks like the retry code of handleTwitterHTTPError goes straight to the last else else: raise e

I think I have not reached the 350 requests/hour rate.

How do I fix this error?

Thanks

ptwobrussell commented 11 years ago

I apologize for the delay in getting back to you on this. I will definitely find a way to put a little bit of time on it over this weekend and hopefully have a resolution for you. I think what is happening is an artifact of the recent refactor related to the Twitter v1.1 API changing and this particular path through the code just hasn't been executed. Should have something for you soon.

ptwobrussell commented 11 years ago

I haven't forgotten about you. Just a bit backed up. This is 2nd on my TODO list, and I am hopeful to have a resolution for you by late tonight or early tomorrow. So sorry for the delay.

mh-github commented 11 years ago

No probs, will wait. Am trying to impress a popular actress in the Indian film industry, so it will be worth the wait :)-

mh-github commented 11 years ago

Hi Matthew, did you get a chance to look at the issue?

ptwobrussell commented 11 years ago

I again apologize for the delay. There's some good news and some slightly less than good news: The good news is that I have had an opportunity to dig into the problem and I now very clearly understanding what is going on now and I have a solution sketched out that should work very nicely. The slightly less than good news is that I won't get to it till tomorrow morning, which probably means it will be one more day of time for you since I think you are on the other wide of the world? In any event, I will alert you by updating this ticket once I have the code updated and checked in.

In short, the v1.1 API fundamentally changes how rate limits work, and when I revised the code to be v1.1 compatible, I didn't hit any rate limit issues, so this part of the code didn't get revised as it should have gotten revised, but that will get fixed very soon.

Just so you know what to expect with this fix, per https://dev.twitter.com/docs/rate-limiting/1.1/limits Twitter monitors rate limits based on 15 minute intervals now, so what that means in your situation is that you'll be able to collect 75,000 friend ids and 75,000 follower ids per 15 minute window if that's what you're trying to do. You mentioned that you are working on a script for a celebrity, so if you know this celebrity's number of friends/followers from their profile page, you should be able to do the math and calculate how long it will take to ultimately accumulate this many results if you are trying to get the totality of the data.

As a heads up, I will have a draft of Part II (Twitter Recipes) for the 2nd Edition of Mining the Social Web done by the end of next week, so you should also keep an eye out for it. I think you'll find it very useful.

Out of pure curiosity, are you using a console script or working from one of the IPython Notebooks?

More in the morning, and thank you again for your patience. Work on the 2nd Edition has been dominating my time as of late, and your attention to this detail is helpful and important for me as I prepare to work on the Twitter Recipes...

mh-github commented 11 years ago

The celebrity has 244,000 followers, so it should take me a little more than 45 minutes. Is not a problem; awaiting the updated script(s) from you.

I use the shell on iMac with Mountain Lion. I don't know what iPython is, should I try it?

ptwobrussell commented 11 years ago

I am just wrapping up some other work and will be checking in your scripts before I wrap up for the evening after dinner. Some other things took priority earlier, but I'm looking forward to wrap this up for you.

I would highly recommend that you try IPython as an overall superior Python interpreter and then transition into IPython Notebook.

You basically will just need to "pip install ipython" and "easy_install readline" (don't try to use pip for this one) in order to get a great IPython interpreter experience that I think you'll find much more enjoyable. If you don't have pip on your system yet, you can install it with easy_install first via "easy_install pip".

http://ipython.org

ptwobrussell commented 11 years ago

@mh-github - Give it a try now. I included some details in the commit message for you. In short, you should be able to run the twitter__util.py script to see that the HTTP 429 Error you were seeing should be addressed and then turn your attention back to collecting those ids.

I'll give you the pleasure of closing this issue if you find that this now works to your satisfaction, but if it doesn't please let me know, and we'll give it another look. Feel free to open other tickets if you find other issues, and thanks again for your patience.

mh-github commented 11 years ago

Hi Matthew -- Yes, the modified script ran successfully.

The only minor issue that I would like to point out is: I checked the Twitter timeline of the celebrity in the browser. It showed 245177. Immediately I kicked off the script. When it finished its run, the message said that the celebrity is being followed by 245169. I wonder why it reported and downloaded 8 ids fewer.

But since the main issue is resolved, I will close this ticket. Thanks for the solution.

ptwobrussell commented 11 years ago

I would probably attribute the small delta in the number of followers to twitter.com using a stale cache. For celebrity types, I highly doubt that they're keeping an index up to date in real time.