trifle / twitterresearch

A starter kit with code for data collection, preparation, and analysis of digital trace data collected on Twitter
Other
43 stars 27 forks source link

Rest API keyword search and extraction #6

Closed HaTonemann closed 8 years ago

HaTonemann commented 8 years ago

Dear developers, it would be very helpful if you could include an opportunity to search the rest API after certain keywords and enable the user of your script package to also extract and save these results.

trifle commented 8 years ago

Hi @HaTonemann, thanks for the suggestion! Twitter's search is definitely an important feature. From a methodological point of view, it's also very problematic in terms of sampling bias, representativity and replicability.

(a) Twitter filters search results by "quality"; what exactly that means is (and will remain) unclear. This means you don't know what kind of content you miss. (b) The API endpoints for search only deliver several days of search results. That makes it mostly impossible for others to replicate your search results. (c) Because of (a) and (b), it's really really hard to reliably capture a continuous stream of tweets over time: You would need to iterate over the complete search timeframe every time to make sure you catch any new tweets that weren't there before.

In short, the omission of search is a conscious decision: We do NOT recommend using search to collect and archive data.

That being said, I could be persuaded that having search tools available as functions in the code is a good idea. We should then probably issue some stern warnings and leave the saving/exporting part to users.

TL/DR: What would you use search for?

HaTonemann commented 8 years ago

Hi @trifle, thank you for the quick response and reminder of the restrictions imposed by Twitter itself. I wanted to use the keyword search for the rest API to identify users who dealt with a certain topic on Twitter during the last seven days so that I can create a sample of user that I am able to track via the streaming API right now.

trifle commented 8 years ago

@HaTonemann fair enough, that's a good use case. I'll see if I find the time to add some code.

trifle commented 8 years ago

Very rudimentary function added in 286a32d. This only returns the first page of results but can be extended easily.

trifle commented 8 years ago

@HaTonemann it would be great to hear some feedback in case you decide to use this, btw.

HaTonemann commented 8 years ago

Thanks @trifle for the fast and proper working solution. The only thing I realized was that the direct output of the function can not be stored in a JSON file. But this can also be due to my rudimentary knowledge of python and data formats.

trifle commented 8 years ago

@HaTonemann that works just like it does for the other functions, for example see:

https://github.com/trifle/twitterresearch/blob/master/examples.py#L118

So you could write:

result, tweets, metadata = rest.search_tweets(query="berlin")
with open("tweets.json", "w") as f:
    for tweet in tweets:
         f.write(json.dumps(tweet) + "\n")