ropensci / software-review

rOpenSci Software Peer Review.
294 stars 104 forks source link

Submission: rtweet #302

Closed mkearney closed 5 years ago

mkearney commented 5 years ago

Submitting Author: Michael W. Kearney (@mkearney)
Repository: https://github.com/mkearney/rtweet
Version submitted: v0.6.9
Editor: @sckott
Reviewer 1: @andrewheiss
Reviewer 2: @briatte
Archive: TBD
Version accepted: TBD


Package: rtweet
Type: Package
Version: 0.6.9
Title: Collecting Twitter Data
Authors@R: c(
    person("Michael W.", "Kearney", ,
    email = "kearneymw@missouri.edu", role = c("aut", "cre"),
    comment = c(ORCID = "0000-0002-0730-4694"))
    ## add contributor template (middle name/initial optional)
    #person("First Middle", "Last", ,
    #email = "email@address.com", role = c("ctb"))
    )
Description: An implementation of calls designed to collect and
    organize Twitter data via Twitter's REST and stream Application
    Program Interfaces (API), which can be found at the following URL:
    <https://developer.twitter.com/en/docs>.
Depends:
    R (>= 3.1.0)
Imports:
    httr (>= 1.3.0),
    jsonlite (>= 0.9.22),
    magrittr (>= 1.5.0),
    tibble (>= 1.3.4),
    utils,
    progress,
    Rcpp,
    httpuv
License: MIT + file LICENSE
URL: https://CRAN.R-project.org/package=rtweet
BugReports: https://github.com/mkearney/rtweet/issues
Encoding: UTF-8
Suggests:
    ggplot2,
    knitr,
    magick,
    openssl,
    readr,
    rmarkdown,
    testthat (>= 2.1.0),
    webshot,
    covr,
    igraph
VignetteBuilder: knitr
LazyData: yes
RoxygenNote: 6.1.1
LinkingTo: 
    Rcpp

Scope

Data retrieval because the package allows users to easily request and import data from Twitter's REST and stream APIs.

Data munging because significant work is done to convert JSON objects returned by Twitter's APIs into tabular data frames.

The target audience is researchers. The scientific applications span a range of topics. Here's is the relevant excerpt from paper.md

Although rtweet provides some coverage to user context-behaviors (e.g., posting statuses, liking tweets, following users, etc.), the primary audience for the package to date has been researchers. Accordingly, rtweet has been featured in numerous popular press [e.g., @bajak2019democrats; @machlis2019r; @riley2019twitter] and academic publications [e.g., @bossetta2018simulated; @bradley2019major; @buscema2018media; @erlandsen2018twitter; @gitto2019brand; @kearney2019analyzing; @kearney2018analyzing; @li2018sentiment; @lutkenhaus2019tailoring; @lutkenhaus2019mapping; @molyneux2018media; @tsoi2018can; @unsihuay2018topic; @valls2017urban; @wu2018finding].

To date no other package interfaces with both REST and stream APIs. The twitteR package is most similar, but it has entered a stage of deprecation (I've agreed to carry the torch, so to speak). So, not only does twitteR not reflect some recent changes to Twitter's API (most notably the introduction of 'extended tweet' mod–the new 280 character limit), but it lacks active maintenance thanks–in part–to rtweet filling the void.

https://github.com/ropensci/software-review/issues/193

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options - [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [x] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

briatte commented 5 years ago

As I said in https://github.com/openjournals/joss-reviews/issues/1454 -- I'll happily review the package here. I've reviewed for JOSS (alongside @andrewheiss, in fact), and would gladly review for @ropensci :-)

mkearney commented 5 years ago

I tentatively listed @andrewheiss and @briatte as reviewers because they initially expressed their willingness to review on a [now closed] JOSS thread. I don't want to presume they are/will be the official reviewers–I'm not completely familiar with the ropensci process, but I'd imagine the selection of reviewers isn't entirely up to me–but I figured reviewer suggestions/volunteers were worth knowing!

sckott commented 5 years ago

thanks for the reviewer suggestions, much appreciated.

sckott commented 5 years ago

Editor checks:


Editor comments

Thanks for your submission @mkearney !

Here's the output from goodpractice. If you haven't used goodpractice it's an R package that checks a number of things with another package - most of which we agree with and want authors to follow. You don't need to address these now, it's info for reviewers to use to get started.

── GP rtweet ─────

It is good practice to

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R/bearer_token.R:56:1
    R/bearer_token.R:80:1
    R/coords.R:119:1
    R/coords.R:131:1
    R/coords.R:132:1
    ... and 37 more lines

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

    R/favorites.R:75:45
    R/post.R:326:12
    R/post.R:336:12
    R/stream.R:338:13
    R/utils.R:327:40
    ... and 1 more lines

  ✖ avoid the library() and require() functions, they change the global search path. If you need to use other packages, import them. If you need to
    load them explicitly, then consider loadNamespace() instead, or as a last resort, declare them as 'Depends' dependencies.

    R/utils.R:485:5

  ✖ fix this R CMD check NOTE: Note: found 113868 marked UTF-8 strings
  ✖ checking tests ...  SEE QUESTION ABOVE

Seeking reviewers now 🕐


Reviewers:

briatte commented 5 years ago

Here's version 1 version 2 of my review, with comments shown as list sub-items. I'll add the functionality tests soon.

Package Review (version 2, final)

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).
    • The .bib file is clean enough, but could be cleaned further (some DOI fields come as URLs, other as just DOIs).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 0.5 (draft version 1)


Review Comments

Reviewed using R version 3.6.0 on x86_64-apple-darwin15.6.0 (64-bit).

Installed with

devtools::install_github("mkearney/rtweet", dependencies = TRUE)

Tested with both the "out of the box" access to the Twitter API and personal OAuth credentials via create_token -- both worked as expected, although I ran into exactly the same list of issues as @andrewheiss when testing the package with testthat::auto_test_package().

Typo at https://rtweet.info/articles/auth.html -- "autheticate via web browser".

I have only one request beyond what Andrew is suggesting in terms of better documenting how to fix the tests (excellent table suggestion, by the way, Andrew!):

Could the thread about how rwteet compares to other packages also document how the package compares to RTwitterAPI?

I have used both twitteR and RTwitterAPI in the past: in this project, we collected a large amount of Twitter followers, and RTwitterAPI was, back in the days and in our experience, quicker and less error-prone than twitteR. I hope rtweet compares favourably re: RTwitterAPI in that respect.

Last, something not related to the package itself: since Twitter has sunset apps.twitter.com, old tokens created there do not seem to work properly -- for instance, get_followers("famous_person", n = 5000, retryonratelimit = TRUE) will not go beyond 5000 followers. Users need to go through the new vetting process at developer.twitter.com.

andrewheiss commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: ≈5 hours


Review Comments

Documentation and vignettes

All vignettes worked great and were a perfect introduction to the package. It might be helpful to move up the vignette listing near the beginning of the README so that users can find them more easily.

Each function is generally very well documented. There are some sparsely documented functions like bearer_token() and invalidate_bearer() that seem to only be used internally—it might be useful to either add @keywords internal to the roxygen block or add additional documentation explaining what a bearer token is and including some examples.

It's really helpful that the US and the world are baked into lookup_coords() (e.g. lookup_coords("usa")) so users don't need an API key for that. This should probably be documented, though. It caught me by surprise that "usa" was working just fine and all other locations triggered a message—I only figured out why because I looked at the source code for lookup_coords(), and that has a huge all caps note about needing a Google Maps API key. (A future expansion could maybe include hardcoded boundaries for a bunch of other countries (perhaps make an internal lookup table of lat/long coordinates?) so the package is less US-centric and less dependent on another API.)

The pkgdown site is great. It might be helpful (and recommended by rOpenSci) to add @family tags to roxygen documentation so that there are automatic groups of functions in the documentation and in the reference website.

Installation and testing

Installation worked fine with devtools::install_github("mkearney/rtweet", dependencies = TRUE)

Testing doesn't work out of the box and it took me a while to figure out how to generate the /tests/testthat/twitter_tokens RDS file. When logged in through the embedded rstats2twitter app, running this created an auth token that the test suite can use:

saveRDS(get_token(), "tests/testthat/twitter_tokens")

It might be useful to explain that process somewhere in the documentation—not in the README though, since it's not something a typical user would do.

190 tests passed (super impressive coverage!!), and 5 tests failed:

Token creation and authentication

It's so great that people can use this without creating their own Twitter apps, since that process has become long and arduous.

It might be helpful to more clearly explain the benefits of creating an app with tokens vs. just using the interactive browser-based token. Right now, the README says "You may still choose to do this [create an app] (gives you more stability and permissions)", and the auth vignette explains how to do it, but there isn't any explanation of why users might want to do it. Later in the README, under "Post actions," it explains that users would need their own app to post tweets and access DMs, but it could be helpful to have that information up above when describing authentication in general.

It might even be helpful to have a table of sorts showing pros/cons/features available when using a private app vs. the embedded rstats2twitter app, like this maybe?

Task rstats2twitter application Personal application
Work interactively
Create reproducible research
Search tweets
Stream tweets
Get friends
Get timelines
Post tweets
Run package tests

The page for creating apps looks slightly different from the screenshots, and the direction to go to apps.twitter.com is out of date—according to Twitter, that aspect of developer accounts is getting sunset. Instead, the URL is now https://developer.twitter.com/en/apps .

Community

It would be helpful to have some community guidelines in the package too, such as a CONTRIBUTING.md file (perhaps modeled after something like this) and a CONDUCT.md file (like this)

Creating an RStudio project .Rproj file could be helpful for encouraging package contributions, since it would enable RStudio's package building interface.

rOpenSci-related issues

JOSS-related issues

There are a few issues with paper.bib:

Kicking the tires with some use cases

Look at tweets

library(tidyverse)
library(rtweet)

# Check out my tweets
my_tweets <- get_timeline("andrewheiss", n = 500)

head(my_tweets)
#> # A tibble: 6 x 90
#>   user_id status_id created_at          screen_name text  source display_text_wi… reply_to_status…
#>   <chr>   <chr>     <dttm>              <chr>       <chr> <chr>             <dbl> <chr>           
#> 1 167410… 11360038… 2019-06-04 20:16:30 andrewheiss @dva… Twitt…                0 113600361493217…
#> 2 167410… 11359881… 2019-06-04 19:14:09 andrewheiss I ME… Twitt…                6 113598739853218…
#> 3 167410… 11359873… 2019-06-04 19:11:16 andrewheiss Revi… Twitt…               94 NA              
#> 4 167410… 11359848… 2019-06-04 19:01:06 andrewheiss "boo… Twitt…              171 113424196604718…
#> 5 167410… 11359781… 2019-06-04 18:34:42 andrewheiss oh m… Twitt…              140 NA              
#> 6 167410… 11359750… 2019-06-04 18:22:13 andrewheiss As a… Twitt…              140 NA              

ts_plot(my_tweets)

image

Geocoding with sf

library(sf)
library(rnaturalearth)

searched_tweets <- search_tweets(
  q = "lang:en",
  geocode = lookup_coords("usa"), n = 10000
) %>% 
  lat_lng() %>% 
  drop_na(lng) %>% 
  sf::st_as_sf(coords = c("lng", "lat"), crs = 4326) 

us <- ne_states(country = "united states of america", returnclass = "sf") %>% 
  filter(!(postal %in% c("HI", "AK", "PR")))

ggplot() +
  geom_sf(data = us) +
  geom_sf(data = searched_tweets) + 
  coord_sf(crs = 102003, datum = NA) +  # Albers
  theme_void()

image


Magical. This is a great package—fantastic work!


Tested and reviewed using R 3.6.0 on x86_64-apple-darwin15.6.0 (64-bit).

sckott commented 5 years ago

thanks for your review @andrewheiss - @briatte let me know when you're done reviewing

briatte commented 5 years ago

Hi @sckott

I have updated my draft review, which can be considered final. My new comments are at the bottom of the review. In a nutshell, I hit the same issues as @andrewheiss did with the tests, and am asking for additional details to be included about how the package compares to RTwitterAPI.

Generally speaking, the package is awesome, thanks a lot for your work @mkearney!

sckott commented 5 years ago

thanks @briatte

@mkearney follow up here with responses to reviewers

briatte commented 5 years ago

FYI, I've just posted an issue to rtweet that has to do with getting followers for accounts with over 5,000 followers: https://github.com/mkearney/rtweet/issues/340 (update: issue now closed, thanks!)

Perhaps this issue could lead to better documentation re: how to collect large amounts of followers/friends? As the issue says, I am unsure that I have understood how get_followers(retryonratelimit = TRUE) works.

Last and still related to followers, I subscribe to the suggestion made here to harmonize the output of get_followers and get_friends: https://github.com/mkearney/rtweet/issues/308

mkearney commented 5 years ago

@sckott:

First, thanks to @briatte and @andrewheiss for taking the time to review rtweet and providing such great feedback!

Second, please see my detailed reply to each reviewer below. This revision process has so far resulted in a lot of changes to rtweet–many of which were quite difficult😅–but I am quite certain the package is much better for it.

Thank,

-Mike

@briatte

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Tested with both the "out of the box" access to the Twitter API and personal OAuth credentials via create_token -- both worked as expected, although I ran into exactly the same list of issues as @andrewheiss when testing the package with testthat::auto_test_package().

Typo at https://rtweet.info/articles/auth.html -- "autheticate via web browser".

Could the thread about how rwteet compares to other packages also document how the package compares to RTwitterAPI?

Task rtweet twitteR twitteR RTwitterAPI
Available on CRAN
Updated since 2016
Non-'developer' access
Extended tweets (280 chars)
Parses JSON data
Converts to data frames
Automated pagination
Search tweets
Search users
Stream sample
Stream keywords
Stream users
Get friends
Get timelines
Get mentions
Get favorites
Get trends
Get list members
Get list memberships
Get list statuses
Get list subscribers
Get list subscriptions
Get list users
Lookup collections
Lookup friendships
Lookup statuses
Lookup users
Get retweeters
Get retweets
Post tweets
Post favorite
Post follow
Post messsage
Post mute
Premium 30 day
Premium full archive
Premium 30 day
Run package tests

@andrewwheiss

Documentation and vignettes

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

All vignettes worked great and were a perfect introduction to the package. It might be helpful to move up the vignette listing near the beginning of the README so that users can find them more easily.

Each function is generally very well documented. There are some sparsely documented functions like bearer_token() and invalidate_bearer() that seem to only be used internally—it might be useful to either add @keywords internal to the roxygen block or add additional documentation explaining what a bearer token is and including some examples.

It's really helpful that the US and the world are baked into lookup_coords() (e.g. lookup_coords("usa")) so users don't need an API key for that. This should probably be documented, though. It caught me by surprise that "usa" was working just fine and all other locations triggered a message—I only figured out why because I looked at the source code for lookup_coords(), and that has a huge all caps note about needing a Google Maps API key. (A future expansion could maybe include hardcoded boundaries for a bunch of other countries (perhaps make an internal lookup table of lat/long coordinates?) so the package is less US-centric and less dependent on another API.)

The pkgdown site is great. It might be helpful (and recommended by rOpenSci) to add @family tags to roxygen documentation so that there are automatic groups of functions in the documentation and in the reference website.

Installation and testing

Testing doesn't work out of the box and it took me a while to figure out how to generate the /tests/testthat/twitter_tokens RDS file. When logged in through the embedded rstats2twitter app, running this created an auth token that the test suite can use...It might be useful to explain that process somewhere in the documentation—not in the README though, since it's not something a typical user would do.

(FAILED) direct_messages: This is because the rstats2twitter app doesn't provide read-write DM access. It worked when I used my own auth token

(FAILED) get_my_timeline: One test failed because rtweet:::home_user() not equal to "kearneymw". Mike's account is hardcoded in the tests and this test can't pass if someone is using a different account. I don't know what else could be done to test it though.

(FAILED) lookup_coords: This returns an error that seems unrelated to token issues and I don't know while it's failing

(FAILED) tweet_shot: This failed because I didn't have PhantomJS installed. The test worked after installing it. Running tweet_shot() interactively gives a note that it should be installed with webshot::installphantomjs(), but this message is invisible when testing on a fresh system. PhantomJS isn't installed as a dependency when the package in installed, but that's because it's not an R package. idk what the best way to handle this is, though, since it's 16+ MB and not handled automatically with R ¯_(ツ)

(FAILED) test-test_username-r: Like get_my_timeline, this failed because Mike's account is hardcoded in. Again, I don't know what the best way to handle this is.

Token creation and authentication

It might be helpful to more clearly explain the benefits of creating an app with tokens vs. just using the interactive browser-based token. Right now, the README says "You may still choose to do this [create an app] (gives you more stability and permissions)", and the auth vignette explains how to do it, but there isn't any explanation of why users might want to do it. Later in the README, under "Post actions," it explains that users would need their own app to post tweets and access DMs, but it could be helpful to have that information up above when describing authentication in general. It might even be helpful to have a table of sorts showing pros/cons/features available when using a private app vs. the embedded rstats2twitter app, like this maybe?

Task rstats2twitter user-app
Work interactively
Search/lookup tweets/users
Get friends/followers
Get timelines/favorites
Get lists/collections
Post tweets
Run package tests
Use Bearer token
Read/Write Direct Messages

The page for creating apps looks slightly different from the screenshots, and the direction to go to apps.twitter.com is out of date—according to Twitter, that aspect of developer accounts is getting sunset. Instead, the URL is now https://developer.twitter.com/en/apps .

Community

It would be helpful to have some community guidelines in the package too, such as a CONTRIBUTING.md file (perhaps modeled after something like this) and a CONDUCT.md file (like this)

Creating an RStudio project .Rproj file could be helpful for encouraging package contributions, since it would enable RStudio's package building interface.

rOpenSci-related issues

* [Following rOpenSci's package guidelines](https://ropensci.github.io/dev_guide/building.html#creating-metadata-for-your-package), it might be helpful to use the **codemetar** package to generate a `codemeta.json` file.
* Consider adding a repostatus.org badge too ([following rOpenSci's guidelines](https://ropensci.github.io/dev_guide/building.html#readme))
* Also add the rOpenSci peer review badge

JOSS-related issues

There are a few issues with paper.bib:

* "Twitter as a tool of para-disploomacy" is misspelled
* Several of the cited articles that have actual DOIs are missing DOIs (like "How can we better use Twitter to find a person who got lost due to dementia?" is https://dx.doi.org/10.1038/s41746-018-0017-5)
sckott commented 5 years ago

thx for your responses to reviews @mkearney

any feedback/thoughts on the comments reviewers ? (@briatte @andrewheiss )

andrewheiss commented 5 years ago

Other than a duplicate "Premium 30 day" row in the big comparison table, this addresses all my concerns and I think it's good to go ✅

stefaniebutland commented 5 years ago

@mkearney since you're keen on contributing a blog post once the review process is complete 🙂 ... This tag gets you (diverse) examples of blog posts by authors of peer-reviewed packages: https://ropensci.org/tags/software-peer-review/.

Here are some technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post. Publication date is flexible. I like to get a draft via pull request a week before the planned publication date so I can review.

Happy to answer any questions. I'm thrilled that you've put rtweet through review.

sckott commented 5 years ago

@briatte Are you happy with the changes made?

mkearney commented 5 years ago

Any update on this?

sckott commented 5 years ago

@mkearney let's move on. I'll take a final look and get back to you ASAP

sckott commented 5 years ago

looks good, just a single comment: ^codemeta\.json$ should be in .Rbuildignore

Approved! Thanks @mkearney for submitting and @briatte @andrewheiss for your reviews!

To-dos:

For JOSS:

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

mkearney commented 5 years ago

@sckott I’m not currently seeing an invite (and I wasn’t authorized to transfer to ropensci). Where would I find this?

briatte commented 5 years ago

@briatte Are you happy with the changes made?

Yes! With apologies for not signalling it earlier.

@mkearney Thanks again for a brilliant package. I can report it's currently in use in various research projects in Paris and Lille, France :-)

sckott commented 5 years ago

@mkearney sorry, now you should have gotten an invite ... let me know if you don't see it soon

mkearney commented 5 years ago

Got it! Thanks!

mkearney commented 5 years ago

TO DO for rOpenSci:

TODO for JOSS:

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

stefaniebutland commented 5 years ago

@mkearney When would you like to publish the post? Tues Nov 12 is earliest date available, and that would mean submitting a draft via pull request by Tues Nov 5. Later dates are also open.

Details https://github.com/ropensci/software-review/issues/302#issuecomment-529040066

jeroen commented 5 years ago

Docs are now live on https://docs.ropensci.org/rtweet/

mkearney commented 5 years ago

@stefaniebutland Nov 5th/12th works for me!

stefaniebutland commented 5 years ago

@mkearney Will you submit a draft post soon? Technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post