ptwobrussell / Mining-the-Social-Web

The official online compendium for Mining the Social Web (O'Reilly, 2011)
http://bit.ly/135dHfs
Other
1.21k stars 491 forks source link

search twitter and collect search results from 'mining the social web' examples #71

Closed tk0485 closed 10 years ago

tk0485 commented 10 years ago

I'm reading the code for 'mining the social web 2nd E' on here and I'm trying to understand how example 6 is working! I'm trying to print the length of statuses and is outputting different results, below I will display two code snippets and the results for each one and I hope if somebody can explain to me why I'm getting different results... thanks in advance.

1st code snippet:
q = '#python' 

count = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q,count=count)

statuses = search_results['statuses']

# Iterate through 5 more batches of results by following the cursor

for _ in range(5):
    print "Length of statuses", len(statuses)
    try:
        next_results = search_results['search_metadata']['next_results']
    except KeyError, e: # No more results when next_results doesn't exist
        break

the output is:

Length of statuses 100
Length of statuses 100
Length of statuses 100
Length of statuses 100
Length of statuses 100

which is exactly what I'm expecting. but if I add this to the above code:

q = '#python' 

count = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q,count=count)

statuses = search_results['statuses']

# Iterate through 5 more batches of results by following the cursor

for _ in range(5):
    print "Length of statuses", len(statuses)
    try:
        next_results = search_results['search_metadata']['next_results']
    except KeyError, e: # No more results when next_results doesn't exist
        break

    # Create a dictionary from next_results, which has the following form:
    # ?max_id=313519052523986943&q=NCAA&include_entities=1
    kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ])

    search_results = twitter_api.search.tweets(**kwargs)
    statuses += search_results['statuses']

the output will be:

Length of statuses 100
Length of statuses 200
Length of statuses 200

my question is why in the second time it prints only three batches and not five as the for loop is set to loop five times?? and why they are not of 100 count each?

ptwobrussell commented 10 years ago

Hi @tk0485 -

I think there are a couple of things going on. Notice in your first listing (the modified version of Example 5 where you've omitted some of the code in the loop) that the only time you actually make an API call to twitter is with the line twitter_api.search.tweets(q=q,count=count). Inside of the loop, len(statuses) is printing the same value 5 times because you're just looping over the same data 5 times and printing it. (Although you say that this is exactly what you'd expect, I think what you'd expect is to see the length of statuses increase by ~100 each time through.)

In the second listing (Example 5 as it is in the text) you are making an additional call to Twitter's API each time you execute the line twitter_api.search.tweets(**kwargs). What you see here is that the first time though the loop, you get an additional 100 items (increasing the length of statuses from 100 to 200), you get nothing back the second time through the loop, and this is why the loop terminates. In other words, whatever you are searching for only has ~200 search results available for whatever reason.

Does this help clarify what is going on? Is there anything I could have done better to make it clearer that this is happening in the text?

ptwobrussell commented 10 years ago

Oh, and by the way, can you log all future tickets at the 2nd Edition's GitHub repository: https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition (Thanks!)

tk0485 commented 10 years ago

Hi @ptwobrussell ,

Thanks a lot for the quick reply, really appreciate it! the answer was really helpful and it really clarified why I was having different output.. I want to ask if you have any beginner resources (hopefully in Python), in addition to 'mining the social web', that I can use to learn how to search and collect tweets for a day period and for specific keywords (entities)?

tk0485 commented 10 years ago

@ptwobrussell sorry I replied here .. in the future I will log my tickets at the 2nd E GitHub..

ptwobrussell commented 10 years ago

No worries, and I'm always glad to help. Have you taken a look at the Chapter 9 code from the 2nd Edition? It contains a recipe that should illustrate how to use the Streaming API to collect tweets in realtime, which may be what you're after. It's not difficult at all once you wrap your head around it to either build up a structure in memory and dump it out to disk as JSON periodically or to use one of the other recipes (also in Chapter 9) to save the results to a database like MongoDB.

A rendered version of the Chapter 9 notebook that you can peruse online is here: http://nbviewer.ipython.org/urls/raw.github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/master/ipynb/Chapter%209%20-%20Twitter%20Cookbook.ipynb

Of course, when you are ready to do develop, be sure to work with the actual notebook itself, preferably within the virtual machine environment since it gives you MongoDB straight out of the box.

Can you take a look and let me know how it goes? (I'm actually doing this every thing you describe right now for a query of interest.)

ptwobrussell commented 9 years ago

Are you using the IPython Notebooks and VM as provided? In the past, this error has been because readers have been mistakenly installing the wrong "twitter" module when they run the examples on their own machines.

On Mon, Mar 2, 2015 at 12:20 PM, vaibhavtripathi notifications@github.com wrote:

Example 1. auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET)

is giving the following error: AttributeError: 'module' object has no attribute 'oauth'

What might be the reason? Kindly help. I'm stuck.

— Reply to this email directly or view it on GitHub https://github.com/ptwobrussell/Mining-the-Social-Web/issues/71#issuecomment-76768479 .

vaibhavtripathi commented 9 years ago

Hey Russell, Thanks a lot for your reply. Your guess was right indeed. The problem was with the API. Sorry for the trouble. By the way, yours is an awesome book. I am loving it. Congratulations!

Regards, Vaibhav Tripathi

On Tue, Mar 3, 2015 at 1:14 AM, Matthew A. Russell <notifications@github.com

wrote:

Are you using the IPython Notebooks and VM as provided? In the past, this error has been because readers have been mistakenly installing the wrong "twitter" module when they run the examples on their own machines.

On Mon, Mar 2, 2015 at 12:20 PM, vaibhavtripathi <notifications@github.com

wrote:

Example 1. auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET)

is giving the following error: AttributeError: 'module' object has no attribute 'oauth'

What might be the reason? Kindly help. I'm stuck.

— Reply to this email directly or view it on GitHub < https://github.com/ptwobrussell/Mining-the-Social-Web/issues/71#issuecomment-76768479

.

— Reply to this email directly or view it on GitHub https://github.com/ptwobrussell/Mining-the-Social-Web/issues/71#issuecomment-76796267 .