ropensci / rtweet

🐦 R client for interacting with Twitter's [stream and REST] APIs
https://docs.ropensci.org/rtweet
Other
786 stars 201 forks source link

get_favorites(...,max_id) batch iterating backwards does not work as expected #200

Closed mikejacktzen closed 6 years ago

mikejacktzen commented 6 years ago

For my own twitter profile, I want to get the full list of tweets my profile favorited. Because of the rate limit, i want to batch the calls to rtweet::get_favorites() . So I want to work backwards. I use the code below but the results do not behave as expected

This might be related to https://github.com/mkearney/rtweet/issues/147

Below, I'll use your profile as an example user='kearneymw'

library(rtweet)
library(dplyr)

# want a specific user's favorite tweets (all of them)
# batch request sequentially, eg sliding interval

user='kearneymw'
num_fav = lookup_users(users=user)$favourites_count
num_fav

# get id of single most recent tweet (as baseline to work backwards)

batch_0 =  rtweet::get_favorites(user=user,n = 1)
dim(batch_0)

batch_0$status_id %>% min()
batch_0$status_id %>% max()
id_0 = batch_0$status_id

# tried diff combos of since_id/max_id = min/max, none get desired behavior
# from r package doc and twitter api doc, 
# 'max_id' should be what we want to use as interval slider

# from baseline, get older 1000 tweets
batch_1 = rtweet::get_favorites(user=user,
                      n = 1000,
                      max_id=id_0)

dim(batch_1)

id_0
min_1=batch_1$status_id %>% min()
max_1=batch_1$status_id %>% max()
min_1
max_1

# start from 'most recent'
# batch iterate backwards 
# to full set ending at 'oldest'

# expect previous 'min' to be current 'max', then look backwards
# want next 1000 favorited tweets

# max_id    
# Returns results with status_id less (older) than or equal to (if hit limit) the specified status_id

batch_2 = rtweet::get_favorites(user=user,
                                n = 1000,
                                max_id=min_1)

min_2 = batch_2$status_id %>% min()
max_2 = batch_2$status_id %>% max()

id_0

min_1
max_1

min_2
max_2

# not expected behavior

# is this because 'status_id' is not for the target user's favorite list
# is it referencing the 'original'source' tweet's status_id that the target user has favorited?

library(dplyr)

fave_user = bind_rows(batch_0,
                          batch_1,
                          batch_2) %>%
  distinct(status_id, user_id, .keep_all = TRUE)

dim(fave_user)

# should expect close to 2000 distinct rows

The 'second batch' is pretty much the same as the 'first batch', therefore no 'backwards progress' is made. The goal was to iteratively work backwards to recover the full list of tweets a target profile has favorited.

The root problem seems to be that setting the argument 'max_id' to the status_id value of the previous batch does nothing.

mikejacktzen commented 6 years ago

Here's a more clear version that max_id does not act as the slider as expected using only 2 batches of 10 tweets this time identical(batch_1[,'status_id'],batch_2[,'status_id'])

returns a TRUE

# get id of single most recent tweet (as baseline to work backwards)

batch_0 =  rtweet::get_favorites(user=user,n = 1)
dim(batch_0)

id_0 = batch_0$status_id

# from baseline, get older 1000 tweets
batch_1 = rtweet::get_favorites(user=user,
                                n = 10,
                                max_id=id_0)

dim(batch_1)

id_to_start_next = batch_1[nrow(batch_1),'status_id']
id_to_start_next

batch_2 = rtweet::get_favorites(user=user,
                                n = 10,
                                max_id=id_to_start_next)

dim(batch_2)

identical(batch_1[,'status_id'],batch_2[,'status_id'])
mkearney commented 6 years ago

@mikejacktzen I'll try to give you some feedback that's specific to your code (when I get back from vacation in about a week), but I believe Twitter's REST API limits you to the most recent 3,000 'likes'. I suppose it's possible if it's the authorizing user [context] then maybe you can get more (I think maybe not but I can't say for sure)?

mikejacktzen commented 6 years ago

yeah, no rush.

i can confirm the 3000ish limit ive tried 3 methods, rtweet, twitteR and pythons twitterAPI. unfortunately, the limit is enforced on the twitter side of things.

i think this rtweet issue flagged here is still a problem. using the example above, batch 1 has 10 tweets and batch 2 is ten tweets, both batches contain the same 10 tweets

jas1 commented 6 years ago

thanks for the amazing package :D

context: have the same issue. was trying to make the shiny app suggested here: https://jsta.rbind.io/blog/making-a-twitter-dashboard-with-r/

Adding " all the likes " instead of 100.

2 problems arised: A) the twitter limit on 1 fetch is 199 ( tried 500 then 40 ... to 100 , then 100 up 150 to 199. )

B) when tried to " paginate " the likes batching 199 , +199 ... etc ... always get the same 199. the max_id parameter offset does not work.

code invoked: user_name<-"me" current_api_max <- 199 offset<- "981979235521884162" my_likes <- get_favorites(user_name,n=current_api_max,max_id = offset) %>% select("status_id","created_at", "screen_name", "text", "urls_expanded_url") %>% arrange(desc(created_at))

think if B can be solved well get this going :D

thanks in advance and keep the awsome work :D