ropensci / rtweet

šŸ¦ R client for interacting with Twitter's [stream and REST] APIs
https://docs.ropensci.org/rtweet
Other
785 stars 201 forks source link

question on open data of tweets #291

Closed jwijffels closed 5 years ago

jwijffels commented 6 years ago

Hi @mkearney My apologies if this is not the right place for asking this question. I'm developing an R wrapper around Starspace here: https://github.com/bnosac/ruimtehol In order to build an example of a classification model, I was thinking on doing it on tweets in order to categorise the hashtag of a tweet. Do you happen to have a .RData file containing tweets available which can be used to incorporate in that package or do you - par hasard - know of a place where I can find such data?

mkearney commented 5 years ago

I could provide you with a vector of status_id's that you could use as a data set. Twitter's TOS says not to share the data beyond the ID. But with the IDs, users could lookup the data via rtweet::lookup_tweets(). Would tweets from verified vs non-verified users work? Or do you need more response options or a more continuous outcome (like fav/retweet counts)?

jwijffels commented 5 years ago

Thanks for the feedback. Wasn't aware of the Twitter TOS. That will be a blocking factor. I was basically looking for a ready-made dataset containing tweets such that I don't need to let the R package depend on another package. I'll look to another example then. Thank you for your input either way!

mkearney commented 5 years ago

Perhaps anonymize the data? Or maybe tweets from public figuresā€“e.g., predicting whether Trump tweet was sent from iPhone, Android, and/or other; predict partisanship of the account tweeter; etc.?

jwijffels commented 5 years ago

Yes, that's exactly possible with that package but I would need a CC0 dataset in .RData format such that I can easily include it in the R package.