ropensci-archive / rtweet

🐦 R client for interacting with Twitter's [stream and REST] APIs
https://docs.ropensci.org/rtweet
Other
786 stars 200 forks source link

search_tweets pulls way more tweets than n specified #449

Closed send-me-dogs closed 3 years ago

send-me-dogs commented 4 years ago

Problem

I try to run a line like this, specifying I want to pull 10 tweets using the word "politics", but then it pulls usually over 12,000 tweets. I've tried changing n to other low numbers but it's always the same.

data <- search_tweets("funny", n = 10, include_rts = FALSE)

rtweet version

0.7.0

Token

request: https://api.twitter.com/oauth/request_token authorize: https://api.twitter.com/oauth/authenticate access: https://api.twitter.com/oauth/access_token rstats2twitter key: 6j7Ig4xzHlBr8uUJ5A4Ym0NTf secret: oauth_token, oauth_token_secret, user_id, screen_name --
Arf9999 commented 4 years ago

I can't reproduce this.

data <- rtweet::search_tweets("funny", n = 10, include_rts = FALSE)
> glimpse(data)
Rows: 10
Columns: 90
$ user_id                 <chr> "72434931", "1560525734", "1250740123076923392", "1264512468782477312", "1315992408904077…
$ status_id               <chr> "1318134034136993797", "1318134033096822784", "1318134033067302912", "1318134028441022464…
$ created_at              <dttm> 2020-10-19 10:16:58, 2020-10-19 10:16:57, 2020-10-19 10:16:57, 2020-10-19 10:16:56, 2020…
$ screen_name             <chr> "breezy_brynn", "Blu4evrlvsJDC", "renhyunn", "WillOfTheKing", "luciflary", "unboundings",…
$ text                    <chr> "This is funny only if those waters aren’t dangerous which they probably aren’t lol \U000…
$ source                  <chr> "Twitter for iPhone", "Twitter Web App", "Twitter for Android", "Twitter for Android", "N…
$ display_text_width      <dbl> 106, 237, 28, 32, 33, 187, 59, 121, 115, 57
$ reply_to_status_id      <chr> NA, "1317980807915540481", NA, NA, NA, NA, NA, NA, NA, NA
$ reply_to_user_id        <chr> NA, "520481378", NA, NA, NA, NA, NA, NA, NA, NA
$ reply_to_screen_name    <chr> NA, "brnxsheri", NA, NA, NA, NA, NA, NA, NA, NA
$ is_quote                <lgl> TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE
$ is_retweet              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
$ favorite_count          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ retweet_count           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ quote_count             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ reply_count             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ hashtags                <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
$ symbols                 <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
$ urls_url                <list> ["twitter.com/nochlllwlll/st…", NA, "twitter.com/yutafirst/stat…", "twitter.com/OniNoSan…
$ urls_t.co               <list> ["https://t.co/BC7nmrXZvt", NA, "https://t.co/WBcPpQqMDJ", "https://t.co/oGJVkL10OH", NA…
$ urls_expanded_url       <list> ["https://twitter.com/nochlllwlll/status/1317671130744979463", NA, "https://twitter.com/…
$ media_url               <list> [NA, NA, NA, NA, "http://pbs.twimg.com/media/Ekrz6UNUcAExTUd.jpg", NA, NA, NA, NA, NA]
$ media_t.co              <list> [NA, NA, NA, NA, "https://t.co/2Qh0Qd8wqV", NA, NA, NA, NA, NA]
$ media_expanded_url      <list> [NA, NA, NA, NA, "https://twitter.com/luciflary/status/1318134025769283584/photo/1", NA,…
$ media_type              <list> [NA, NA, NA, NA, "photo", NA, NA, NA, NA, NA]
$ ext_media_url           <list> [NA, NA, NA, NA, "http://pbs.twimg.com/media/Ekrz6UNUcAExTUd.jpg", NA, NA, NA, NA, NA]
$ ext_media_t.co          <list> [NA, NA, NA, NA, "https://t.co/2Qh0Qd8wqV", NA, NA, NA, NA, NA]
$ ext_media_expanded_url  <list> [NA, NA, NA, NA, "https://twitter.com/luciflary/status/1318134025769283584/photo/1", NA,…
$ ext_media_type          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ mentions_user_id        <list> [NA, <"520481378", "14480351", "1267210398266204162">, NA, NA, NA, NA, NA, NA, NA, NA]
$ mentions_screen_name    <list> [NA, <"brnxsheri", "adamblust", "Jennife79527579">, NA, NA, NA, NA, NA, NA, NA, NA]
$ lang                    <chr> "en", "en", "en", "en", "en", "en", "en", "en", "en", "en"
$ quoted_status_id        <chr> "1317671130744979463", NA, "1317817105538953216", "1318133850061623296", NA, NA, NA, NA, …
$ quoted_text             <chr> "she left her cheating bf at sea LMAOO https://t.co/pO6NS0U23H", NA, "yuta: walking \n\nj…
$ quoted_created_at       <dttm> 2020-10-18 03:37:33, NA, 2020-10-18 13:17:36, 2020-10-19 10:16:14, NA, NA, NA, NA, NA, 2…
$ quoted_source           <chr> "Twitter for iPhone", NA, "Twitter for Android", "Twitter for Android", NA, NA, NA, NA, N…
$ quoted_favorite_count   <int> 181462, NA, 3527, 0, NA, NA, NA, NA, NA, 25
$ quoted_retweet_count    <int> 33946, NA, 1115, 0, NA, NA, NA, NA, NA, 12
$ quoted_user_id          <chr> "883455697", NA, "1139883418492256256", "1316026461082451972", NA, NA, NA, NA, NA, "74952…
$ quoted_screen_name      <chr> "NOCHlLLWlLL", NA, "yutafirst", "OniNoSantoryu", NA, NA, NA, NA, NA, "vuyiswamb"
$ quoted_name             <chr> "lil uzi vers", NA, "sab`", "‍ ‍ ‍ ‍ ‍ ‍𝘡𝘖𝘙𝘖", NA, NA, NA, NA, NA, "Vuvu Acting King Yama…
$ quoted_followers_count  <int> 4262, NA, 4819, 97, NA, NA, NA, NA, NA, 23908
$ quoted_friends_count    <int> 798, NA, 3662, 97, NA, NA, NA, NA, NA, 19639
$ quoted_statuses_count   <int> 152158, NA, 28270, 123, NA, NA, NA, NA, NA, 47252
$ quoted_location         <chr> "dmv", NA, "", "", NA, NA, NA, NA, NA, "Pretoria , Mamelodi"
$ quoted_description      <chr> "Don’t care, didn’t ask, plus you’re not black #BLACKLIVESMATTER", NA, "♡特殊な: #YUTA ؛ ✒⠁t…
$ quoted_verified         <lgl> FALSE, NA, FALSE, FALSE, NA, NA, NA, NA, NA, FALSE
$ retweet_status_id       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_text            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_created_at      <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_source          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_favorite_count  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_retweet_count   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_user_id         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_screen_name     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_name            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_followers_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_friends_count   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_statuses_count  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_location        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_description     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ retweet_verified        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ place_url               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ place_name              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ place_full_name         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ place_type              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ country                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ country_code            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ geo_coords              <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA…
$ coords_coords           <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA…
$ bbox_coords             <list> [<NA, NA, NA, NA, NA, NA, NA, NA>, <NA, NA, NA, NA, NA, NA, NA, NA>, <NA, NA, NA, NA, NA…
$ status_url              <chr> "https://twitter.com/breezy_brynn/status/1318134034136993797", "https://twitter.com/Blu4e…
$ name                    <chr> "breezy \U0001f618", "LC", "yan | semi-ia", "𝙼𝚘𝚗𝚔𝚎𝚢 𝕯▪︎𝙻𝚞𝚏𝚏𝚢", "luci ☆ playing nicola’s r…
$ location                <chr> "", "", "+65 she/her ☁️", "{ Shinsekai }", "", "any prns", "East London, South Africa", "…
$ description             <chr> "", "Happily Married. No DM's #resist #fbr #blm #voteblue #Biden2020", "ɴ\u1d04\u1d1b 𝒏𝒐𝒊…
$ url                     <chr> NA, NA, NA, NA, NA, "https://t.co/gt6d4CZ5Yu", NA, NA, "https://t.co/8m88cQsZl1", "https:…
$ protected               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
$ followers_count         <int> 145, 1188, 470, 106, 89, 156, 5592, 323, 992, 3154
$ friends_count           <int> 321, 1205, 626, 81, 91, 112, 694, 387, 774, 698
$ listed_count            <int> 1, 1, 4, 0, 0, 3, 0, 2, 16, 7
$ statuses_count          <int> 11145, 5343, 554, 199, 87, 1496, 10963, 1551, 26426, 45143
$ favourites_count        <int> 32523, 4909, 2329, 137, 130, 725, 205, 1622, 55139, 15865
$ account_created_at      <dttm> 2009-09-08 01:11:50, 2013-07-01 13:15:26, 2020-04-16 10:57:51, 2020-05-24 11:04:35, 2020…
$ verified                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
$ profile_url             <chr> NA, NA, NA, NA, NA, "https://t.co/gt6d4CZ5Yu", NA, NA, "https://t.co/8m88cQsZl1", "https:…
$ profile_expanded_url    <chr> NA, NA, NA, NA, NA, "https://kalosprofessor.carrd.co/", NA, NA, "http://tadatonin.carrd.c…
$ account_lang            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ profile_banner_url      <chr> "https://pbs.twimg.com/profile_banners/72434931/1585276761", "https://pbs.twimg.com/profi…
$ profile_background_url  <chr> "http://abs.twimg.com/images/themes/theme10/bg.gif", "http://abs.twimg.com/images/themes/…
$ profile_image_url       <chr> "http://pbs.twimg.com/profile_images/1243367201995993088/nBcam0q1_normal.jpg", "http://pb…
AlexB51 commented 4 years ago

I've had the same issue with search_tweets as well as with get_friends/get_followers. They roughly appear to bin to the rate limits for me. For example, search_tweets(query, n = 10000) returned 18,000 tweets. get_followers(acct_name, n=100000) returned 140,000 users (70k rate limit before timeout). I'm not sure if your experience has been the same on your side. I use rtweet 0.7.0 as well.

llrs commented 3 years ago

Seems that there is a bug somewhere when retryonratelimit = TRUE as I get 179000 results with:

data <- search_tweets("funny", n = 10000, include_rts = FALSE, retryonratelimit = TRUE)
hadley commented 3 years ago

Closing in favour of #510.