sarikayamehmet / GetOldTweets-R

A project written in R to get old tweets, it bypass some limitations of Twitter Official API.
MIT License
18 stars 3 forks source link

Needs only user tweets - Yardim ? #5

Open wahshibinharb opened 4 years ago

wahshibinharb commented 4 years ago

Hello and tesekkurler for sharing!

Script works fine, Problem is when run the script it's collecting all user tweets, retweets and comments. When i tried to run the script for a year data it just gets me for 1000 tweets.. so far so good. Problem is i am getting 2 months tweets cause of its getting all data(tweets,retweets and comment).

It's possible to get only user tweets. how to do ?

Thanks again

sarikayamehmet commented 4 years ago

For user search use that searchTerm <- "(from%3ArealDonaldTrump)" # for user search

Also increase below parameter to get more data ntweets = 1000

wahshibinharb commented 4 years ago

Thank you!

Is there a way to see how many times a certain phrase or group of words have been tweeted?

Example : I wanna know "ayasofya" word how many times tweeted between this days etc.

sarikayamehmet commented 4 years ago

You can use below code to take number of tweets:

# Input parameters
startdate =  "2020-07-23"
enddate = "2020-07-24"
language = "tr"
searchTerm <- "ayasofya"
searchbox <- URLencode(searchTerm)
# convert to url
temp_url <- paste0("https://twitter.com/i/search/timeline?f=tweets&q=",searchbox,"%20since%3A",startdate,"%20until%3A",enddate,"&l=",language,"&src=typd&max_position=")

webpage <- fromJSON(temp_url)
if(webpage$new_latent_count>0){
  tweet_ids <- read_html(webpage$items_html) %>% html_nodes('.js-stream-tweet') %>% html_attr('data-tweet-id')
  breakFlag <- F
  while (webpage$has_more_items == T) {
    tryCatch({
      min_position <- webpage$min_position
      next_url <- paste0(temp_url, min_position)
      webpage <- fromJSON(next_url)
      next_tweet_ids <- read_html(webpage$items_html) %>% html_nodes('.js-stream-tweet') %>% html_attr('data-tweet-id')
      next_tweet_ids <- next_tweet_ids[!is.na(next_tweet_ids)]
      tweet_ids <- unique(c(tweet_ids,next_tweet_ids))
    },
    error=function(cond) {
      message(paste("URL does not seem to exist:", next_url))
      message("Here's the original error message:")
      message(cond)
      breakFlag <<- T
    })

    if(breakFlag == T){
      break
    }
  }
} else {
  paste0("There is no tweet about this search term!")
}
print(length(tweet_ids))
wahshibinharb commented 4 years ago

Result :

URL does not seem to exist: https://twitter.com/i/search/timeline?f=tweets&q=ayasofya%20since%3A2020-07-23%20until%3A2020-07-24&l=tr&src=typd&max_position=thGAVUV0VFVBaEgLvxj5WO2iMWgIC7meGVstojEjUAFQAlAFUAFQAVARUAFQAA Here's the original error message: HTTP error 429.

it makes me wait like a 10 min or so then fails

wahshibinharb commented 4 years ago

Also i am getting this

URL does not seem to exist: https://twitter.com/i/search/timeline?f=tweets&q=ayasofya%20since%3A2020-07-23%20until%3A2020-07-24&l=tr&src=typd&max_position=thGAVUV0VFVBaAwL2BhYGY2iMWgIC7meGVstojEjUAFQAlAFUAFQAVARUAFQAA Here's the original error message: HTTP error 429.

print(length(tweet_ids)) [1] 9400

M-1993-fsu commented 3 years ago

Hi, I tried to use your script today but after the line "webpage <- fromJSON(temp_url)", I get the following error "Error in open.connection(con, "rb") : HTTP error 404.". If I try to use the URL on my browser it turns out the page doesn't exist anymore. Is this the actual issue? How should I correct this? Maybe it's trivial but I just started using R and Twitter API last week.