niemanlab / openfuego

Watching Twitter all day—so you don’t have to.
http://www.niemanlab.org/fuego
MIT License
175 stars 53 forks source link

Open Fuego exceeds Twitter's rate limits #3

Open hashdo opened 10 years ago

hashdo commented 10 years ago

Hi. First, thanks for releasing Open Fuego. Really enjoyed jumping into the code, which is lean and clever.

Ran into a problem early on: Twitter has blocked my app for going over their rate limits.

Looking at the Collector class, there is no mechanism for throttling the number of requests made per 15 minute block. It doesn't even check to see if you're getting close to the limit. I let this thing run for 3 hours, testing it out, and now my account can no longer make requests at all.

Which leads me to my current problem: The Collector dies with a "429" response and the Consumer keeps (pointlessly) going whenever I attempt to run Open Fuego.

I'll try and patch it if I can, and this really isn't a question but more of an issue and a warning to anyone looking to try out the code on a local dev machine.

If you run Open Fuego "as is" be very very careful to work within Twitter's posted rate limits. Newbs (like me) will get burned, and badly.

phelps commented 10 years ago

Hi, hashdo:

Apologies for the trouble. Because the Collector uses Twitter's streaming API, it should only ever require one connection (barring any occasional disconnects, which the code handles automatically). This leads me to wonder if there is an error in your Twitter credentials and the Collector fails to connect, then tries again and again until it hits the rate limit.

My code used to have a spot check that tested the user's Twitter credentials before trying to connect, and now I'm not seeing it. I think it might have gotten lost in a merge somewhere. Let me push that and see if it helps you.

phelps commented 10 years ago

Committed a Twitter credentials check: https://github.com/niemanlab/openfuego/commit/11198cec57a8f7d53d0bf5b034e0a7cf9fab987b

hashdo commented 10 years ago

Phelps! Thanks for the quick reply. For someone who claimed not to be a programmer in the release announcement, you're something of a wizard.

Two things up front: (1) It's not an authorization problem and (2) your patch works just fine.

I didn't think it was an issue with my credentials, because if it were I'd expect twitter to come back with a 401 response ("Unauthorized") instead of a 429 ("Too Many Requests"). To test this, I changed my own username in config.php to a junk string. Sure enough, the patch you added kicked in and told me my credentials were bad. Switching it back just gave me the 429 again.

Looking at the Universe class, I noticed the error ("Dying ...") originated from the populate method. Which meant twitter choked when openfuego grabbed all the authorities to seed the openfuego_citizens table.

That's when I remembered that, at some point during my late night of bleary-eyed testing, I had changed my list of 15 authorities. That's when the app stopped working and twitter blocked me with a 429 error.

Most of my list of authorities follow less than 1,000 users. One of them follows 2,800, and as you might guess, he's the culprit. If I leave him off the list, the script works fine. Adding him makes twitter respond with a 429.

When this latest block wears off, I'm going to poke around a bit and figure out what it is about this user that makes twitter choke and the Collector die. (I'm also curious what happens when you add too many names to the authorities list. The app doesn't check to see if any of the names are valid, or if there's more than 15 users on it, or how many people they are following, before it tries to grab their info.)

In the meantime, I apologize for the title of this thread. It's not really Open Fuego's fault. This is a weird, weird bug that couldn't be anticipated.

Please feel free to close this out, although I'd like the thread to remain so other people who run into a similar issue can check their authority lists as the first stop in their debugging.

Thanks for your help!

phelps commented 10 years ago

I am still scratching my head about this. Are you positive you never had more than 15 authorities in your list? While I say in the code comments that 15 is the max, you're right, the app is not enforcing that limit. (I will patch it.) I know that if you specify more than 15 authorities, Fuego hits Twitter's (abnormally strict) rate limit and dies.

As for a particular user following a lot of people, the friends/ids endpoint should be able to return up to 5,000 user IDs per request.

hashdo commented 10 years ago

Yep, I'm absolutely sure. I always counted to make sure I was at 15 or under while I was testing the app.

When I suspected what was happening, I dumped the database and started over with just one user at a time. No other name caused issues. I can message you the list if you like. I'd be curious to know if you get the same results.

I didn't test it as much as I would have liked to, as I ran into another issue (the collector freaks out mid-run, eats 100% of the CPU and dies. Still trying to get a handle on that one.)