ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

tidytags: Simple Collection and Powerful Analysis of Twitter Data #382

Closed bretsw closed 2 years ago

bretsw commented 4 years ago

Date accepted: 2022-01-31 Submitting Author Name: Bret Staudt Willet Submitting Author Github Handle: !--author1-->@bretsw<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@jrosen48<!--end-author-others-- Repository: https://github.com/bretsw/tidytags Version submitted: 0.1.0 Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @llrs, @marionlouveaux

Due date for @llrs: 2021-04-19 Due date for @marionlouveaux: 2021-04-27

Archive: TBD
Version accepted: TBD


Package: tidytags
Version: 0.1.0
Title: Simple Collection and Powerful Analysis of Twitter Data
Authors@R: c(
    person("K. Bret", "Staudt Willet", , 
      email = "bret@bretsw.com", role = c("aut", "cre"),
      comment = c(ORCID = "0000-0002-6984-416X")
    ),
    person("Joshua M.", "Rosenberg", ,
      role = c("aut"),
      comment = c(ORCID = "0000-0003-2170-0447")
    )
  )
Description: {tidytags} coordinates the simplicity of collecting tweets over time 
    with a [Twitter Archiving Google Sheet](https://tags.hawksey.info/) (TAGS) and the utility of the 
    [{rtweet} package](https://rtweet.info/) for processing and preparing additional Twitter metadata. 
    {tidytags} also introduces functions developed to facilitate systematic yet 
    flexible analyses of data from Twitter.
License: GPL-3
URL: https://bretsw.github.io/tidytags/, https://github.com/bretsw/tidytags
Depends: 
    R (>= 4.0)
Imports:
    dplyr (>= 0.8),
    googlesheets4 (>= 0.2),
    purrr (>= 0.3),
    readr (>= 1.3),
    rlang(>= 0.4),
    rtweet (>= 0.7),
    stringr (>= 1.4),
    tibble (>= 3.0), 
    tidyr (>= 1.0),
    tidyselect (>= 1.0)
Suggests:
    beepr,
    covr,
    ggplot2,
    knitr,
    longurl,
    mapsapi,
    mapview,
    rmarkdown,
    testthat,
    tidyverse,
    urltools,
    usethis
Encoding: UTF-8
VignetteBuilder: knitr
LazyData: TRUE
RoxygenNote: 7.1.0

Scope

{tidytags} allows for both simple data collection and thorough data analysis. In short, {tidytags} first uses a Twitter Archiving Google Sheet (TAGS) to easily collect tweet ID numbers and then uses the R package {rtweet} to re-query the Twitter API to collect additional metadata. {tidytags} also introduces new functions developed to facilitate systematic yet flexible analyses of data from Twitter.

The target users for {tidytags} are social scientists (e.g., educational researchers) who have an interest in studying Twitter data but are relatively new to R, data science, or social network analysis. {tidytags} scaffolds tweet collection and analysis through a simple workflow that still allows for robust analyses.

{tidytags} wraps together functionality from several useful R packages, including {googlesheets4} to bring data from the TAGS tracker into R and {rtweet} for retrieving additional tweet metadata. The contribution of {tidytags} is to bring together the affordance of TAGS to easily collect tweets over time (which is not straightforward with {rtweet}) and the utility of {rtweet} for collecting additional data (which are missing from TAGS). Finally, {tidytags} reshapes data in preparation for geolocation and social network analyses that should be accessible to relatively new R users.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

JOSS Options - [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

bretsw commented 3 years ago

Thank you, @maelle, as ever for your insight and guidance here. I wrestled with this through the rest of the day yesterday and ended up in this same spot: there's a key needed for Google Sheets, which I missed because this gets saved to the local environment in a way that seems persistent. I've started the process of figuring out how to best save and call the Google API key. I'll need to rewrite a tidytags function or two, document the process of getting and using the key in the setup vignette, re-record vcr cassettes, etc. I have a plan at least. I'll keep you posted!

sckott commented 3 years ago

That's pretty awful that Google makes you put an api key in a query param. I'd expect better from them. Anyway, hopefully the new filter query params option will work.

bretsw commented 3 years ago

Thanks Scott! Certainly caught me off guard yesterday. I am excited to implement the new vcr feature at least. I'll keep you posted with how it's going!

bretsw commented 3 years ago

I though I had this Google API key issue figured out, but no such luck. @maelle, how familiar are you with googlesheets4?

Something is happening with the stored Google API key that I'm not understanding. Somehow my old Google API key is stored somewhere in a way that I can't seem to change or access, until I record vcr cassettes and see that it is exposed in the request URL. Which should be fine, because I've revoked that token and now have a new one. However, my new API key won't work when I run googlesheets4::gs4_auth_configure(api_key = Sys.getenv("GOOGLE_API_KEY")), but running googlesheets4::gs4_deauth() (which sets the API token to NULL) and googlesheets4::gs4_auth_configure(api_key = NULL) (which sets the API key to NULL) somehow lets me query the sheets API. That is, with both a NULL key and a NULL token, I can successfully run googlesheets4::range_read(googlesheets4::gs4_examples("deaths")) or perform my tidytags package tests (locally).

In sum, there's an old, deactivated API key stored somewhere I can't locate and being accessed in a way I can't decipher. The old API key is still currently exposed in several vcr cassettes (in the "fixtures" directory), but I'm ok with this for now because the key is actually decommissioned.

Any ideas?

maelle commented 3 years ago

Not familiar at all!

If I follow correctly there are two problems

The next step would be to ask for help on RStudio community forum (since googlesheets4 is an RStudio package, I'd expect more users there than on rOpenSci forum).

bretsw commented 3 years ago

@maelle, I figured it out! I was getting ready to post to the RStudio community forum, and first I looked everything over one more time. I altered the restrictions to the Google API key in the Cloud Console setup, and this did the trick! The issue wasn't with my code but the restrictions. I've updated the tidytags setup vignette to make this clearer.

maelle commented 3 years ago

:tada: :clap: so only tests with real requests are "needed" before we proceed IIRC.

bretsw commented 3 years ago

Yes! I'm on it today or early next week. So close.

bretsw commented 3 years ago

Hi i@maelle, I've set up tests with real requests (https://github.com/bretsw/tidytags/blob/master/.github/workflows/weekly-check.yaml), but they have not been seeming to run at the scheduled time:

on:
  schedule:
    - cron:  '0 6 * * MON,WED,FRI'

Do you see anything obviously wrong? I've tried to reference the rladies example (https://github.com/rladies/meetupr/blob/master/.github/workflows/with-auth.yaml) for inspiration and search elsewhere, but it's not clear to me why the scheduler isn't doing anything. I previously schedule for Sunday midnight but nothing happened over the weekend either.

I'll ask in the RStudio Community forum (https://community.rstudio.com/t/testthat-motivation/27251/4) if there's nothing readily apparent to you.

maelle commented 3 years ago

Hello! It seems the problem is not the cron syntax but your referring to matrix.config.os without defining it (it can't be shared betwen workflow files).

bretsw commented 3 years ago

Isn't the matrix defined on lines 28-30? This reflects lines 26-28 in the rladies' meetupr yaml.

maelle commented 3 years ago

Right but it seems it isn't parsed? The meetupr YAML doesn't work either :sweat_smile: https://github.com/rladies/meetupr/actions/runs/673097825

dpprdan commented 3 years ago

Maybe I'm missing something, but it seems to me that it did run: https://github.com/bretsw/tidytags/actions/runs/675062243

bretsw commented 3 years ago

Thanks @dpprdan for catching that this actually did run (yay!) and @maelle for demonstrating the solution for parsing in meetupr. I'm testing with tidytags now and will report back. I really appreciate the community support!

maelle commented 3 years ago

I am still not sure I understand my own meetupr YAML file so I updated it. Therefore only @dpprdan deserves thanks. :joy:

bretsw commented 3 years ago

Hi @maelle, looks like everything is working with tidytags:

I think(?) I've checked everything off the list!

maelle commented 3 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 3 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/382_status.svg)](https://github.com/ropensci/software-review/issues/382)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

maelle commented 3 years ago

For info I made a call on Twitter https://twitter.com/ma_salmon/status/1374349523900899328 (if Twitter isn't appropriate for this, then for what is it useful :grin: ) hoping to find someone using TAGS in particular. I'll contact potential reviewers (using TAGS or not) within the next few days.

bretsw commented 3 years ago

Sounds good! I've retweeted your call. Thank you!

maelle commented 3 years ago

@ropensci-review-bot add @llrs to reviewers

ropensci-review-bot commented 3 years ago

That can't be done if there is no editor assigned

maelle commented 3 years ago

@ropensci-review-bot assign @maelle as editor

ropensci-review-bot commented 3 years ago

Assigned! @maelle is now the editor

maelle commented 3 years ago

@ropensci-review-bot add @llrs to reviewers

ropensci-review-bot commented 3 years ago

@llrs added to the reviewers list. Review due date is 2021-04-19. Thanks @llrs for accepting to review! Please refer to our reviewer guide.

maelle commented 3 years ago

@bretsw please don't forget to add the badge mentioned in https://github.com/ropensci/software-review/issues/382#issuecomment-804893607 :slightly_smiling_face:

bretsw commented 3 years ago

Thanks for the reminder, I totally missed that prompt, probably from skimming past messages from the review bot. Sorry bot! I'll add NEWS.md next.

maelle commented 3 years ago

@ropensci-review-bot add @marionlouveaux to reviewers

ropensci-review-bot commented 3 years ago

@marionlouveaux added to the reviewers list. Review due date is 2021-04-27. Thanks @marionlouveaux for accepting to review! Please refer to our reviewer guide.

maelle commented 3 years ago

As discussed with @marionlouveaux, amending the due date for review to 2021-04-27 to accommodate @marionlouveaux's schedule.

maelle commented 3 years ago

Thanks @llrs and @marionlouveaux for accepting to review! :pray:

llrs commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

The setup vignette is not clear what steps are necessary and which not, I would suggests adding titles and an index.

he vignettes have a "Pain point #4" which I couldn't find referenced anywhere. Also, perhaps using the more descriptive title would make it easier for users to know what it is about (Removing the Pain point reference entirely of the title of the section). Not that they aren't pain points but just redirect users when needed to the solutions/documentation as they go. However, most of the code chunks of the vignette are not run (as reported by BiocCheck):

    * WARNING: Evaluate more vignette chunks.
        # of code chunks: 8
        # of eval=FALSE: 5
        # of nonexecutable code chunks by syntax: 0
        # total unevaluated 5 (62%)

And of those run are adding documentation or set up of the vignette. Perhaps some kind of setup specific for the vignettes could be used, otherwise they defeat their purpose and turn into plain READMEs. (I know it is not easy for CRAN, so maybe set them up as articles just on the website but outside CRAN?)

To create the google API key step is not clear enough (perhaps a redesign on API configuration interface?). An indication to use Google Sheet API to the question "Find out what kind of credentials you need?", would be helpful.

It should be pointed out that OpenCage Geocoding API key is not needed to use the package. Also the discussion about the price and API limits might be good for an issue but doesn't fit well on the vignette (I've seen that @maelle asked for this, but now that the package is settle in Open Cage maybe it is no longer needed or it can be reduced).

On the chunk about "dplyr::glimpse(example_after_rtweet)" I get a different result 2204 rows compared to the 2,215 reported on the vignette.

When I run the following code chunk I get an error (as I don't have the package longurl yet)

example_domains <- get_url_domain(example_urls)

Before using a package in Suggests, it should be tested if they can be loaded (you can use rlang::is_installed(longurl or requireNamespace("longurl", quietly = TRUE)).

Last, I don't know how to push data to the google sheet TAGS created on the first vignette.

Examples cannot be run without the authentication setup and there is no mention of this on the help pages. Perhaps a minor comment will remind users.

There isn't any BugReports but on the vignettes there is info about how to get help. Would suggest to add the issues link on the description too. The contributing file is extensive and well organized.

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

There's an additional " on the yaml heading of paper.md that prevented viewing the paper.

Functionality

I got 2 tests that failed and 3 warnings (besides 2 that were skipped). The test-get_url_domain.R:14:5 test reported domain4 not equal to "npr.org", on the browser I get asked for cookie consent on the browser, when run locally, outside testthat or vcr, I get the url of choice.npr.org. The other failing test are weird (as I don't get them when I run them on the R console but only on the build/check Rstudio panel).

I have a development version installed of vcr and one of the warnings is related to it. The new version warns when the cassettes are empty, this in my experience means that the test it not conclusive, but this could also be related to not having the geo code API enabled. The other warnings are on test-get_url_domain.R, lines 3 and 32, Invalid URL I'm not sure why, because when I paste on my browser I get redirected to https://www.aect.org/about_us.php. (BTW perhaps the link can be changed to https instead of http).

Estimated hours spent reviewing: 4


Review Comments

The package rtweet is experimenting drastic changes (I'm involved on rtweet maintenance) and there will be a major release with breaking changes. Probably it will break this package (the recommendation about the token will change for instance), so be ready to update it accordingly.

The package contains relative few simple functions that provide more data from twitter or make it easier to analyze it. I have not analyzed Twitter data (except for an overview of a user account), so don't know how useful the data is. I am not a user of TAGs but I'm a bit puzzles how to add information to the google sheet: if I'm a new user how should I do it? I mean I can get the template but how I fill it? I think this package would be easier for non technical people if it included a function to add the information gathered via rtweet or processed with the package back to the original google sheet.

Haven't fully read the paper.md for JOSS but I think it is short enough and comprenhsive of the package.

From a more technical point of view, I have some comments about the code and the package:

There are 75 lines longer than 80 characters, try to reduce them. Probably it is just a matter of style and perhaps creating new shorter variables

Also namespaces in Imports field not imported from: ?gargle? ?readr?. All declared Imports should be used.

The get_char_tweet_ids function could be improved, with only one argument if it is a data.frame then extract the status_id and get the ID via id_str. If it is an url you can just extract the last numbers with gsub("https?\\://twitter.com\\/.+/statuses/", "", df$status_url), no need to modify the data.frame and then extract the vector again.

On process_tweets you can simplify the is_self_reply to ifelse(.data$is_reply & .data$user_id == .data$reply_to_user_id, TRUE, FALSE).

On get_upstream_replies the examples are not informative, as there are no replies to get data from on the example dataset. You make multiple calls to pull_tweet_data, some of them might be unnecessary. The process_tweets can be called just once at the end instead of multiple times and on each loop run. This should speed up the process. Also, if there are at most 90000 tweets taken from each run, then you can estimate the number of iterations needed and inform the user. This might make the wait easier. Perhaps it would be better to use lookup_many tweets as it does a similar process. However, users might hit the rate limit and I don't see any information being passed to the user regarding this.

Looking at create_edgelist, it calls process_tweets and also get_replies, get_retweets, get_quotes, get_mentions which they call process_tweets too. Perhaps some internal functions could be created to avoid calling process_tweets multiple times on the same data.

maelle commented 3 years ago

Thanks a lot for your review @llrs! :rocket:

Note that regarding JOSS, we've just changed the process as JOSS will be the ones determining whether the software fits in their scope.

maelle commented 3 years ago

@llrs which rtweet version did you use for your review, by the way?

@bretsw do you use rtweet CRAN version or the GitHub version with the newer changes?

Thank you :slightly_smiling_face:

llrs commented 3 years ago

@maelle I used the CRAN version

bretsw commented 3 years ago

@maelle I use the CRAN version as well

marionlouveaux commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 10h


Review Comments

The {tidytags} package gives the possibility to read a TAGS tracker, which is a Google app continuously collecting tweets from Twitter, based on predefined search criteria and collection frequency. It provides wrappers to {rtweet} and {opencage} functions to simplify the retrieval of metadata, either not fetched by TAGS or not existing in Twitter (in the case of geocoding). In addition, it provides functionnalities to compute additional descriptive variables about the collected tweets and to visualise relationships between tweets. The {tidytags} package interacts with 3 APIs (Google spreadsheets, Twitter and OpenCage) and one Google app (TAGS). For this reason, the set up is a bit long and tedious when done from scratch. The package itself contains a small number of functions, that are well documented.

I used the {pkgreviewr} from RopenSci to conduct my review (a big thanks to the authors of this package). I configured TAGS and created a Gooogle API key. I already had the configuration for {rtweet} and {opencage}. I could run {tidytags} on my own TAGS tracker (and it worked!).

My main comments concern:

I didn't read the paper submitted to JOSS. I am pasting the details of my review in a second comment.

marionlouveaux commented 3 years ago
Session Info ``` > sessionInfo() R version 4.0.4 (2021-02-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 [6] LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] tidytags_0.1.2 devtools_2.3.2 usethis_2.0.1 magrittr_2.0.1 loaded via a namespace (and not attached): [1] uuid_0.1-4 systemfonts_1.0.1 igraph_1.2.6 lazyeval_0.2.2 sp_1.4-5 crosstalk_1.1.1 [7] leaflet_2.0.4.1 ggplot2_3.3.3 urltools_1.7.3 digest_0.6.27 leafpop_0.0.6 htmltools_0.5.1.1 [13] viridis_0.5.1 leaflet.providers_1.9.0 fansi_0.4.2 memoise_2.0.0 covr_3.5.1 googlesheets4_0.3.0 [19] remotes_2.2.0 readr_1.4.0 graphlayouts_0.7.1 svglite_2.0.0 askpass_1.1 prettyunits_1.1.1 [25] colorspace_2.0-0 ggrepel_0.9.1 rtweet_0.7.0 xfun_0.22 dplyr_1.0.5 leafem_0.1.3 [31] callr_3.6.0 crayon_1.4.1 jsonlite_1.7.2 roxygen2_7.1.1 attachment_0.2.1 brew_1.0-6 [37] glue_1.4.2 xmlparsedata_1.0.5 polyclip_1.10-0 gtable_0.3.0 gargle_1.1.0 webshot_0.5.2 [43] pkgbuild_1.2.0 scales_1.1.1 DBI_1.1.1 opencage_0.2.2 Rcpp_1.0.6 viridisLite_0.3.0 [49] units_0.7-1 proxy_0.4-25 praise_1.0.0 clisymbols_1.2.0 stats4_4.0.4 xopen_1.0.0 [55] htmlwidgets_1.5.3 rex_1.2.0 httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.1 pkgconfig_2.0.3 [61] farver_2.1.0 sass_0.3.1 utf8_1.2.1 crul_1.1.0 labeling_0.4.2 tidyselect_1.1.0 [67] rlang_0.4.10 munsell_0.5.0 cellranger_1.1.0 tools_4.0.4 cachem_1.0.4 cli_2.4.0 [73] generics_0.1.0 evaluate_0.14 stringr_1.4.0 fastmap_1.1.0 yaml_2.2.1 processx_3.5.0 [79] knitr_1.31 fs_1.5.0 tidygraph_1.2.0 purrr_0.3.4 satellite_1.0.2 ggraph_2.0.5 [85] xml2_1.3.2 compiler_4.0.4 rstudioapi_0.13 curl_4.3 png_0.1-7 e1071_1.7-6 [91] testthat_3.0.2 tibble_3.1.0 tweenr_1.0.2 bslib_0.2.4 stringi_1.5.3 pkgreviewr_0.2.0 [97] cyclocomp_1.1.0 ps_1.6.0 desc_1.3.0 lattice_0.20-41 whoami_1.3.0 classInt_0.4-3 [103] goodpractice_1.0.2 vctrs_0.3.7 pillar_1.6.0 lifecycle_1.0.0 triebeard_0.3.0 jquerylib_0.1.3 [109] rcmdcheck_1.3.3 raster_3.4-5 mapview_2.9.0 R6_2.5.0 KernSmooth_2.23-18 gridExtra_2.3 [115] sessioninfo_1.1.1 codetools_0.2-18 MASS_7.3-53.1 assertthat_0.2.1 pkgload_1.2.0 openssl_1.4.3 [121] rprojroot_2.0.2 withr_2.4.1 httpcode_0.3.0 hms_1.0.0 lintr_2.0.1 grid_4.0.4 [127] tidyr_1.1.3 class_7.3-18 rmarkdown_2.7 googledrive_1.0.1 sf_0.9-8 ggforce_0.3.3 [133] base64enc_0.1-3 ratelimitr_0.4.1 ```

Test installation

Local installation took several minutes (approx. 3 to 5 minutes) because there is many dependencies. On my machine, it had to install 35 packages

Installation details ``` Installing 35 packages: fauxpas, selectr, broom, forcats, uuid, ids, googledrive, gargle, data.table, blob, polyclip, tweenr, webmockr, rvest, modelr, haven, googlesheets4, dtplyr, dbplyr, ratelimitr, satellite, leafpop, graphlayouts, tidygraph, ggrepel, ggforce, audio, vcr, tidyverse, opencage, mapview, longurl, ggraph, beepr, rtweet Installing packages into ‘/home/marion/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) also installing the dependencies ‘cli’, ‘lubridate’, ‘pillar’ ```

Check package integrity

run checks on tidytags source

Recommendation: In Contributing.md, remind potential contributors to follow the Getting started with tidytags guide before proceeding to a check on the package. Without the API keys, it doesn’t work.

7 failed tests, all related to vcr. These tests pass if I delete the fixtures folder. NB: the errors message contain my secret tokens for Twitter, so I removed most of the URLs and replaced it by “………..”.

Error message for failed tests ══ Failed tests ════════════════════════════════════════════════════════════════ ── Error (test-add_users_data.R:14:3): user data is added properly ───────────── Error: ================================================================================ An HTTP request has been made that vcr does not know how to handle: GET https://api.twitter.com/1.1/users/lookup.json?screen_name=gsa_aect%2CAECT........... vcr is currently using the following cassette: - ../fixtures/users_info.yml - record_mode: once - match_requests_on: method, uri Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues & see https://books.ropensci.org/http-testing ================================================================================ Backtrace: █ 1. ├─vcr::use_cassette(...) test-add_users_data.R:14:2 2. │ └─cassette$call_block(...) 3. └─tidytags::add_users_data(el) test-add_users_data.R:16:4 4. └─rtweet::lookup_users(all_users) 5. ├─base::do.call("lookup_users_", args) 6. └─rtweet:::lookup_users_(...) 7. └─rtweet:::.user_lookup(users, token) 8. └─rtweet:::TWIT(get = get, url, token) 9. └─httr::GET(url, ...) 10. └─httr:::request_perform(req, hu$handle$handle) 11. └─httr:::perform_callback("request", req = req) 12. └─webmockr:::callback(...) 13. └─webmockr::HttrAdapter$new()$handle_request(req) 14. └─private$request_handler(req)$handle() 15. └─eval(parse(text = req_type_fun))(self$request) 16. └─err$run() 17. └─self$construct_message() ── Error (test-get_upstream_replies.R:2:3): get_upstream_replies() finds additional replies ── Error: ================================================================================ An HTTP request has been made that vcr does not know how to handle: POST https://api.twitter.com/1.1/statuses/lookup.json?id=............... vcr is currently using the following cassette: - ../fixtures/upstream_replies.yml - record_mode: once - match_requests_on: method, uri Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues & see https://books.ropensci.org/http-testing ================================================================================ Backtrace: █ 1. ├─vcr::use_cassette(...) test-get_upstream_replies.R:2:2 2. │ └─cassette$call_block(...) 3. └─tidytags::pull_tweet_data(tags_data) test-get_upstream_replies.R:4:4 4. ├─base::ifelse(...) 5. ├─base::ifelse(...) 6. └─rtweet::lookup_statuses(get_char_tweet_ids(df[1:n, ])) 7. ├─base::do.call("lookup_statuses_", args) 8. └─rtweet:::lookup_statuses_(...) 9. └─rtweet:::.status_lookup(statuses[from:to], token = token) 10. └─rtweet:::TWIT(get = get, url, token) 11. └─httr::POST(url, ...) 12. └─httr:::request_perform(req, hu$handle$handle) 13. └─httr:::perform_callback("request", req = req) 14. └─webmockr:::callback(...) 15. └─webmockr::HttrAdapter$new()$handle_request(req) 16. └─private$request_handler(req)$handle() 17. └─eval(parse(text = req_type_fun))(self$request) 18. └─err$run() 19. └─self$construct_message() ── Error (test-get_upstream_replies.R:34:3): get_upstream_replies() works with no new replies found ── Error: ================================================================================ An HTTP request has been made that vcr does not know how to handle: GET https://api.twitter.com/1.1/statuses/lookup.json?id=NA&tweet_mode=extended.............. vcr is currently using the following cassette: - ../fixtures/upstream_replies_empty.yml - record_mode: once - match_requests_on: method, uri Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues & see https://books.ropensci.org/http-testing ================================================================================ Backtrace: █ 1. ├─vcr::use_cassette(...) test-get_upstream_replies.R:34:2 2. │ └─cassette$call_block(...) 3. └─tidytags::get_upstream_replies(sample_data) test-get_upstream_replies.R:35:4 4. ├─base::nrow(pull_tweet_data(id_vector = unknown_replies$reply_to_status_id)) 5. └─tidytags::pull_tweet_data(id_vector = unknown_replies$reply_to_status_id) 6. ├─base::ifelse(...) 7. ├─base::ifelse(...) 8. └─rtweet::lookup_statuses(id_vector[1:n]) 9. ├─base::do.call("lookup_statuses_", args) 10. └─rtweet:::lookup_statuses_(...) 11. └─rtweet:::.status_lookup(statuses[from:to], token = token) 12. └─rtweet:::TWIT(get = get, url, token) 13. └─httr::GET(url, ...) 14. └─httr:::request_perform(req, hu$handle$handle) 15. └─httr:::perform_callback("request", req = req) 16. └─webmockr:::callback(...) 17. └─webmockr::HttrAdapter$new()$handle_request(req) 18. └─private$request_handler(req)$handle() 19. └─eval(parse(text = req_type_fun))(self$request) 20. └─err$run() 21. └─self$construct_message() ── Error (test-lookup_many_tweets.R:3:3): lookup_many_tweets() retrieves additional metadata like pull_tweet_data() ── Error: ================================================================================ An HTTP request has been made that vcr does not know how to handle: GET https://api.twitter.com/1.1/statuses/lookup.json?id=................ vcr is currently using the following cassette: - ../fixtures/lookup_many.yml - record_mode: once - match_requests_on: method, uri Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues & see https://books.ropensci.org/http-testing ================================================================================ Backtrace: █ 1. ├─vcr::use_cassette(...) test-lookup_many_tweets.R:3:2 2. │ └─cassette$call_block(...) 3. └─tidytags::pull_tweet_data(sample_tags, n = 10) test-lookup_many_tweets.R:5:4 4. ├─base::ifelse(...) 5. ├─base::ifelse(...) 6. └─rtweet::lookup_statuses(get_char_tweet_ids(df[1:n, ])) 7. ├─base::do.call("lookup_statuses_", args) 8. └─rtweet:::lookup_statuses_(...) 9. └─rtweet:::.status_lookup(statuses[from:to], token = token) 10. └─rtweet:::TWIT(get = get, url, token) 11. └─httr::GET(url, ...) 12. └─httr:::request_perform(req, hu$handle$handle) 13. └─httr:::perform_callback("request", req = req) 14. └─webmockr:::callback(...) 15. └─webmockr::HttrAdapter$new()$handle_request(req) 16. └─private$request_handler(req)$handle() 17. └─eval(parse(text = req_type_fun))(self$request) 18. └─err$run() 19. └─self$construct_message() ── Error (test-pull_tweet_data.R:7:3): pull_tweet_data() is able to retrieve additional metadata starting with dataframe ── Error: ================================================================================ An HTTP request has been made that vcr does not know how to handle: GET https://api.twitter.com/1.1/statuses/lookup.json?id=................. vcr is currently using the following cassette: - ../fixtures/metadata_from_df.yml - record_mode: once - match_requests_on: method, uri Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues & see https://books.ropensci.org/http-testing ================================================================================ Backtrace: █ 1. ├─vcr::use_cassette(...) test-pull_tweet_data.R:7:2 2. │ └─cassette$call_block(...) 3. └─tidytags::pull_tweet_data(sample_tags, n = 10) test-pull_tweet_data.R:8:4 4. ├─base::ifelse(...) 5. ├─base::ifelse(...) 6. └─rtweet::lookup_statuses(get_char_tweet_ids(df[1:n, ])) 7. ├─base::do.call("lookup_statuses_", args) 8. └─rtweet:::lookup_statuses_(...) 9. └─rtweet:::.status_lookup(statuses[from:to], token = token) 10. └─rtweet:::TWIT(get = get, url, token) 11. └─httr::GET(url, ...) 12. └─httr:::request_perform(req, hu$handle$handle) 13. └─httr:::perform_callback("request", req = req) 14. └─webmockr:::callback(...) 15. └─webmockr::HttrAdapter$new()$handle_request(req) 16. └─private$request_handler(req)$handle() 17. └─eval(parse(text = req_type_fun))(self$request) 18. └─err$run() 19. └─self$construct_message() ── Error (test-pull_tweet_data.R:28:3): pull_tweet_data() is able to retrieve additional metadata starting with tweet IDs ── Error: ================================================================================ An HTTP request has been made that vcr does not know how to handle: GET https://api.twitter.com/1.1/statuses/lookup.json?id=................... vcr is currently using the following cassette: - ../fixtures/metadata_from_ids.yml - record_mode: once - match_requests_on: method, uri Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues & see https://books.ropensci.org/http-testing ================================================================================ Backtrace: █ 1. ├─vcr::use_cassette(...) test-pull_tweet_data.R:28:2 2. │ └─cassette$call_block(...) 3. └─tidytags::pull_tweet_data(id_vector = sample_tags$id_str, n = 10) test-pull_tweet_data.R:29:4 4. ├─base::ifelse(...) 5. ├─base::ifelse(...) 6. └─rtweet::lookup_statuses(id_vector[1:n]) 7. ├─base::do.call("lookup_statuses_", args) 8. └─rtweet:::lookup_statuses_(...) 9. └─rtweet:::.status_lookup(statuses[from:to], token = token) 10. └─rtweet:::TWIT(get = get, url, token) 11. └─httr::GET(url, ...) 12. └─httr:::request_perform(req, hu$handle$handle) 13. └─httr:::perform_callback("request", req = req) 14. └─webmockr:::callback(...) 15. └─webmockr::HttrAdapter$new()$handle_request(req) 16. └─private$request_handler(req)$handle() 17. └─eval(parse(text = req_type_fun))(self$request) 18. └─err$run() 19. └─self$construct_message() ── Error (test-pull_tweet_data.R:49:3): pull_tweet_data() is able to retrieve additional metadata starting with tweet URLs ── Error: ================================================================================ An HTTP request has been made that vcr does not know how to handle: GET https://api.twitter.com/1.1/statuses/lookup.json............... vcr is currently using the following cassette: - ../fixtures/metadata_from_urls.yml - record_mode: once - match_requests_on: method, uri Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues & see https://books.ropensci.org/http-testing ================================================================================ Backtrace: █ 1. ├─vcr::use_cassette(...) test-pull_tweet_data.R:49:2 2. │ └─cassette$call_block(...) 3. └─tidytags::pull_tweet_data(...) test-pull_tweet_data.R:50:4 4. ├─base::ifelse(...) 5. └─rtweet::lookup_statuses(get_char_tweet_ids(url_vector[1:n], url_vector = url_vector[1:n])) 6. ├─base::do.call("lookup_statuses_", args) 7. └─rtweet:::lookup_statuses_(...) 8. └─rtweet:::.status_lookup(statuses[from:to], token = token) 9. └─rtweet:::TWIT(get = get, url, token) 10. └─httr::GET(url, ...) 11. └─httr:::request_perform(req, hu$handle$handle) 12. └─httr:::perform_callback("request", req = req) 13. └─webmockr:::callback(...) 14. └─webmockr::HttrAdapter$new()$handle_request(req) 15. └─private$request_handler(req)$handle() 16. └─eval(parse(text = req_type_fun))(self$request) 17. └─err$run() 18. └─self$construct_message() [ FAIL 7 | WARN 0 | SKIP 2 | PASS 62 ] Error: Test failures

check tidytags for goodpractice:

In all your functions (.R files) and in the tests listed below, the package {goodpractice} detected long code lines (above 80 characters).

Test files with long code lines tidytags/tests/testthat/test-geocode\_tags.R tidytags/tests/testthat/test-get\_char\_tweet\_ids.R tidytags/tests/testthat/test-get\_url\_domain.R tidytags/tests/testthat/test-lookup\_many\_tweets.R tidytags/tests/testthat/test-pull\_tweet\_data.R

Check package metadata files

README

Contributing

DESCRIPTION

I think that you could remove: gargle, covr, roxygen2, tidyverse, usethis, webmockr.

Check documentation

test tidytags function help files:

  1. Documentation of create_edgelist, get_quotes, and get_replies: Typo in “See Also Compare to other tidtags functions such as get_replies(), get_retweets(), get_quotes(), and get_mentions().”
  2. Documentation of get_mentions: same name as a function from {rtweet}. The RopenSci documentation guide says “If there is potential overlap or confusion with other packages providing similar functionality or having a similar name, add a note in the README, main vignette and potentially the Description field of DESCRIPTION. Example in rtweet README, rebird README.” If possible, I would even change the name of these functions (for instance, tt_get_mentions). Typo in “See Also Compare to other tidtags functions such as get_replies(), get_retweets(), get_quotes(), and create_edgelist().”
  3. Documentation of get_quotes and get_replies: The example returns an empty tibble (0 lines).
  4. Documentation of get_retweets: as for get_mentions, the function has the same name as a function from {rtweet}
  5. Documentation of get_upstream_replies: This function does more than just adding replies, it also computes new variables (“word_count”, “character_count”...). This is because get_upstream_replies() calls process_tweets(). It is not clearly stated in the documentation.
  6. Documentation of lookup_many_tweets: Missing example.
  7. Documentation of pull_tweet_data: I would add an intermediate line to avoid repetition of code in the examples like the example below:
    example_url <- "18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8"          
    tags_content <- read_tags(example_url)          
    pull_tweet_data(tags_content[1:10, ])        

And I would add some comments to explain the different examples. I don’t understand the definition of id_vector and n, and why pull_tweet_data(tags_content[1:10, ]) returns only 7 lines, although there is 10 different tweet IDs in id_str according to unique(tags_content[1:10, ]$id_str). As id_vector is the parameter statuses in rtweet::lookup_statuses, it would maybe be better to inherit the parameter. At least, I would use the same vocabulary, and notably talk about “statuses” (a Twitter status is a tweet, a retweet, a quote, or a reply).


test tidytags vignettes:

For both vignettes, I would put more information in bold, because there is quite some text.

Comments on the Vignette Getting started with tidytags

For each step, I would add an example with tidytags functions to test that the set up is correct (test API keys and test access to TAGS).

Comments on the vignette Using tidytags with a conference hashtag


Inspect code:

bretsw commented 3 years ago

Thank you, @llrs and @marionlouveaux, for your careful and thorough reviews of tidytags. @jrosen48 and I will start working through your comments and suggestions. Bear with us, it seems like there's a good bit to tackle. Thank you though—we know this is going to make the package better.

maelle commented 3 years ago

Thank you @marionlouveaux for your in-depth review!

maelle commented 3 years ago

:wave: @bretsw @jrosen48! Any update? :smile_cat:

bretsw commented 3 years ago

Hi @maelle, no update yet. I've turned the two reviews into a long checklist of items, but @jrosen48 and I have been trying to wrap up our semester responsibilities. We're meeting on Monday to take the next steps.

maelle commented 3 years ago

Great, thanks for the update!

maelle commented 3 years ago

:wave: @bretsw @jrosen48! Any update after your meeting?

bretsw commented 3 years ago

Hi @maelle! We talked through all the comments, and we're aiming to have our revisions done by the end of next week (June 11).

maelle commented 3 years ago

For info I've applied a holding label at the authors' request. :slightly_smiling_face:

maelle commented 2 years ago

:wave: @bretsw @jrosen48! Any update? :smile_cat:

bretsw commented 2 years ago

Hi @maelle! Thank you for checking in again. I think @jrosen48 and I are (finally) getting settled. We set a meeting on October 6 to start tackling the requested changes. Excited to get back to this.

maelle commented 2 years ago

Great to read, thank you for the update!