Closed bretsw closed 2 years ago
Thank you, @maelle, as ever for your insight and guidance here. I wrestled with this through the rest of the day yesterday and ended up in this same spot: there's a key needed for Google Sheets, which I missed because this gets saved to the local environment in a way that seems persistent. I've started the process of figuring out how to best save and call the Google API key. I'll need to rewrite a tidytags function or two, document the process of getting and using the key in the setup vignette, re-record vcr cassettes, etc. I have a plan at least. I'll keep you posted!
That's pretty awful that Google makes you put an api key in a query param. I'd expect better from them. Anyway, hopefully the new filter query params option will work.
Thanks Scott! Certainly caught me off guard yesterday. I am excited to implement the new vcr feature at least. I'll keep you posted with how it's going!
I though I had this Google API key issue figured out, but no such luck. @maelle, how familiar are you with googlesheets4?
Something is happening with the stored Google API key that I'm not understanding. Somehow my old Google API key is stored somewhere in a way that I can't seem to change or access, until I record vcr cassettes and see that it is exposed in the request URL. Which should be fine, because I've revoked that token and now have a new one. However, my new API key won't work when I run googlesheets4::gs4_auth_configure(api_key = Sys.getenv("GOOGLE_API_KEY"))
, but running googlesheets4::gs4_deauth()
(which sets the API token to NULL) and googlesheets4::gs4_auth_configure(api_key = NULL)
(which sets the API key to NULL) somehow lets me query the sheets API. That is, with both a NULL key and a NULL token, I can successfully run googlesheets4::range_read(googlesheets4::gs4_examples("deaths"))
or perform my tidytags package tests (locally).
In sum, there's an old, deactivated API key stored somewhere I can't locate and being accessed in a way I can't decipher. The old API key is still currently exposed in several vcr cassettes (in the "fixtures" directory), but I'm ok with this for now because the key is actually decommissioned.
Any ideas?
Not familiar at all!
If I follow correctly there are two problems
The next step would be to ask for help on RStudio community forum (since googlesheets4 is an RStudio package, I'd expect more users there than on rOpenSci forum).
@maelle, I figured it out! I was getting ready to post to the RStudio community forum, and first I looked everything over one more time. I altered the restrictions to the Google API key in the Cloud Console setup, and this did the trick! The issue wasn't with my code but the restrictions. I've updated the tidytags setup vignette to make this clearer.
:tada: :clap: so only tests with real requests are "needed" before we proceed IIRC.
Yes! I'm on it today or early next week. So close.
Hi i@maelle, I've set up tests with real requests (https://github.com/bretsw/tidytags/blob/master/.github/workflows/weekly-check.yaml), but they have not been seeming to run at the scheduled time:
on:
schedule:
- cron: '0 6 * * MON,WED,FRI'
Do you see anything obviously wrong? I've tried to reference the rladies example (https://github.com/rladies/meetupr/blob/master/.github/workflows/with-auth.yaml) for inspiration and search elsewhere, but it's not clear to me why the scheduler isn't doing anything. I previously schedule for Sunday midnight but nothing happened over the weekend either.
I'll ask in the RStudio Community forum (https://community.rstudio.com/t/testthat-motivation/27251/4) if there's nothing readily apparent to you.
Hello! It seems the problem is not the cron syntax but your referring to matrix.config.os
without defining it (it can't be shared betwen workflow files).
Isn't the matrix defined on lines 28-30? This reflects lines 26-28 in the rladies' meetupr yaml.
Right but it seems it isn't parsed? The meetupr YAML doesn't work either :sweat_smile: https://github.com/rladies/meetupr/actions/runs/673097825
Maybe I'm missing something, but it seems to me that it did run: https://github.com/bretsw/tidytags/actions/runs/675062243
Thanks @dpprdan for catching that this actually did run (yay!) and @maelle for demonstrating the solution for parsing in meetupr. I'm testing with tidytags now and will report back. I really appreciate the community support!
I am still not sure I understand my own meetupr YAML file so I updated it. Therefore only @dpprdan deserves thanks. :joy:
Hi @maelle, looks like everything is working with tidytags:
I think(?) I've checked everything off the list!
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/382_status.svg)](https://github.com/ropensci/software-review/issues/382)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
For info I made a call on Twitter https://twitter.com/ma_salmon/status/1374349523900899328 (if Twitter isn't appropriate for this, then for what is it useful :grin: ) hoping to find someone using TAGS in particular. I'll contact potential reviewers (using TAGS or not) within the next few days.
Sounds good! I've retweeted your call. Thank you!
@ropensci-review-bot add @llrs to reviewers
That can't be done if there is no editor assigned
@ropensci-review-bot assign @maelle as editor
Assigned! @maelle is now the editor
@ropensci-review-bot add @llrs to reviewers
@llrs added to the reviewers list. Review due date is 2021-04-19. Thanks @llrs for accepting to review! Please refer to our reviewer guide.
@bretsw please don't forget to add the badge mentioned in https://github.com/ropensci/software-review/issues/382#issuecomment-804893607 :slightly_smiling_face:
Thanks for the reminder, I totally missed that prompt, probably from skimming past messages from the review bot. Sorry bot! I'll add NEWS.md next.
@ropensci-review-bot add @marionlouveaux to reviewers
@marionlouveaux added to the reviewers list. Review due date is 2021-04-27. Thanks @marionlouveaux for accepting to review! Please refer to our reviewer guide.
As discussed with @marionlouveaux, amending the due date for review to 2021-04-27 to accommodate @marionlouveaux's schedule.
Thanks @llrs and @marionlouveaux for accepting to review! :pray:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
The setup vignette is not clear what steps are necessary and which not, I would suggests adding titles and an index.
he vignettes have a "Pain point #4" which I couldn't find referenced anywhere. Also, perhaps using the more descriptive title would make it easier for users to know what it is about (Removing the Pain point reference entirely of the title of the section). Not that they aren't pain points but just redirect users when needed to the solutions/documentation as they go. However, most of the code chunks of the vignette are not run (as reported by BiocCheck):
* WARNING: Evaluate more vignette chunks.
# of code chunks: 8
# of eval=FALSE: 5
# of nonexecutable code chunks by syntax: 0
# total unevaluated 5 (62%)
And of those run are adding documentation or set up of the vignette. Perhaps some kind of setup specific for the vignettes could be used, otherwise they defeat their purpose and turn into plain READMEs. (I know it is not easy for CRAN, so maybe set them up as articles just on the website but outside CRAN?)
To create the google API key step is not clear enough (perhaps a redesign on API configuration interface?). An indication to use Google Sheet API to the question "Find out what kind of credentials you need?", would be helpful.
It should be pointed out that OpenCage Geocoding API key is not needed to use the package. Also the discussion about the price and API limits might be good for an issue but doesn't fit well on the vignette (I've seen that @maelle asked for this, but now that the package is settle in Open Cage maybe it is no longer needed or it can be reduced).
On the chunk about "dplyr::glimpse(example_after_rtweet)" I get a different result 2204 rows compared to the 2,215 reported on the vignette.
When I run the following code chunk I get an error (as I don't have the package longurl yet)
example_domains <- get_url_domain(example_urls)
Before using a package in Suggests, it should be tested if they can be loaded (you can use rlang::is_installed(longurl
or requireNamespace("longurl", quietly = TRUE)
).
Last, I don't know how to push data to the google sheet TAGS created on the first vignette.
add_users_data
, I think it is not needed. Examples cannot be run without the authentication setup and there is no mention of this on the help pages. Perhaps a minor comment will remind users.
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).There isn't any BugReports but on the vignettes there is info about how to get help. Would suggest to add the issues link on the description too. The contributing file is extensive and well organized.
The package contains a paper.md
matching JOSS's requirements with:
There's an additional "
on the yaml heading of paper.md that prevented viewing the paper.
I got 2 tests that failed and 3 warnings (besides 2 that were skipped). The test-get_url_domain.R:14:5 test reported domain4
not equal to "npr.org", on the browser I get asked for cookie consent on the browser, when run locally, outside testthat or vcr, I get the url of choice.npr.org.
The other failing test are weird (as I don't get them when I run them on the R console but only on the build/check Rstudio panel).
I have a development version installed of vcr and one of the warnings is related to it. The new version warns when the cassettes are empty, this in my experience means that the test it not conclusive, but this could also be related to not having the geo code API enabled. The other warnings are on test-get_url_domain.R, lines 3 and 32, Invalid URL I'm not sure why, because when I paste on my browser I get redirected to https://www.aect.org/about_us.php. (BTW perhaps the link can be changed to https instead of http).
Estimated hours spent reviewing: 4
The package rtweet is experimenting drastic changes (I'm involved on rtweet maintenance) and there will be a major release with breaking changes. Probably it will break this package (the recommendation about the token will change for instance), so be ready to update it accordingly.
The package contains relative few simple functions that provide more data from twitter or make it easier to analyze it. I have not analyzed Twitter data (except for an overview of a user account), so don't know how useful the data is. I am not a user of TAGs but I'm a bit puzzles how to add information to the google sheet: if I'm a new user how should I do it? I mean I can get the template but how I fill it? I think this package would be easier for non technical people if it included a function to add the information gathered via rtweet or processed with the package back to the original google sheet.
Haven't fully read the paper.md for JOSS but I think it is short enough and comprenhsive of the package.
From a more technical point of view, I have some comments about the code and the package:
There are 75 lines longer than 80 characters, try to reduce them. Probably it is just a matter of style and perhaps creating new shorter variables
Also namespaces in Imports field not imported from: ?gargle? ?readr?. All declared Imports should be used.
The get_char_tweet_ids
function could be improved, with only one argument if it is a data.frame then extract the status_id and get the ID via id_str. If it is an url you can just extract the last numbers with gsub("https?\\://twitter.com\\/.+/statuses/", "", df$status_url)
, no need to modify the data.frame and then extract the vector again.
On process_tweets
you can simplify the is_self_reply to ifelse(.data$is_reply & .data$user_id == .data$reply_to_user_id, TRUE, FALSE)
.
On get_upstream_replies
the examples are not informative, as there are no replies to get data from on the example dataset. You make multiple calls to pull_tweet_data
, some of them might be unnecessary. The process_tweets
can be called just once at the end instead of multiple times and on each loop run. This should speed up the process. Also, if there are at most 90000 tweets taken from each run, then you can estimate the number of iterations needed and inform the user. This might make the wait easier. Perhaps it would be better to use lookup_many tweets
as it does a similar process. However, users might hit the rate limit and I don't see any information being passed to the user regarding this.
Looking at create_edgelist
, it calls process_tweets
and also get_replies
, get_retweets
, get_quotes
, get_mentions
which they call process_tweets
too. Perhaps some internal functions could be created to avoid calling process_tweets
multiple times on the same data.
Thanks a lot for your review @llrs! :rocket:
Note that regarding JOSS, we've just changed the process as JOSS will be the ones determining whether the software fits in their scope.
@llrs which rtweet version did you use for your review, by the way?
@bretsw do you use rtweet CRAN version or the GitHub version with the newer changes?
Thank you :slightly_smiling_face:
@maelle I used the CRAN version
@maelle I use the CRAN version as well
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
Missing BugReports
Estimated hours spent reviewing: 10h
The {tidytags} package gives the possibility to read a TAGS tracker, which is a Google app continuously collecting tweets from Twitter, based on predefined search criteria and collection frequency. It provides wrappers to {rtweet} and {opencage} functions to simplify the retrieval of metadata, either not fetched by TAGS or not existing in Twitter (in the case of geocoding). In addition, it provides functionnalities to compute additional descriptive variables about the collected tweets and to visualise relationships between tweets. The {tidytags} package interacts with 3 APIs (Google spreadsheets, Twitter and OpenCage) and one Google app (TAGS). For this reason, the set up is a bit long and tedious when done from scratch. The package itself contains a small number of functions, that are well documented.
I used the {pkgreviewr} from RopenSci to conduct my review (a big thanks to the authors of this package). I configured TAGS and created a Gooogle API key. I already had the configuration for {rtweet} and {opencage}. I could run {tidytags} on my own TAGS tracker (and it worked!).
My main comments concern:
I didn't read the paper submitted to JOSS. I am pasting the details of my review in a second comment.
Local installation took several minutes (approx. 3 to 5 minutes) because there is many dependencies. On my machine, it had to install 35 packages
tidytags
sourceRecommendation: In Contributing.md, remind potential contributors to follow the Getting started with tidytags guide before proceeding to a check on the package. Without the API keys, it doesn’t work.
7 failed tests, all related to vcr. These tests pass if I delete the fixtures folder. NB: the errors message contain my secret tokens for Twitter, so I removed most of the URLs and replaced it by “………..”.
tidytags
for goodpractice:In all your functions (.R files) and in the tests listed below, the package {goodpractice} detected long code lines (above 80 characters).
Instead of “Simple Collection and Powerful Analysis of Twitter Data”, I would write “Simple Collection and Powerful Analysis of Twitter Data collected with TAGS”.
In Overview, I would add a sentence to explain what is TAGS. For instance, I would write: {tidytags} retrieves tweet data collected by a Twitter Archiving Google Sheet (TAGS), gets additional metadata from Twitter via the {rtweet} package, and from OpenCage using the opencage package, and provides additional functions to facilitate a systematic yet flexible analyses of data from Twitter. TAGS is based on Google spreadsheets. A TAGS tracker continuously collects tweets from Twitter, based on predefined search criteria and collection frequency." and I would add a link to the vignettes directly there.
In the Setup section, in addition to linking to the Getting started vignette, I would add a checklist of what should be set up in the end, like this: "To use tidytags at its full capacity, you should have the following things set up:
In Getting help, there is a typo: “You may also wish too try some general troubleshooting strategies:”
In Considerations related to ethics…, on word is missing: “In short, please remember that most (if not all) of the you collect may be about people” and duplicated sentence: “{tidytags} should be used in strict accordance with Twitter’s developer terms.”
In addition to that, RopenSci development guide about readme section
suggests to add a Brief demonstration usage directly in the README, which is missing here. It also encourages to add a paragraph about
how to cite the package, which is also missing. I tried citation(package = "tidytags")
. It gives a warning because there
is no date field in description. In the end, the RopenSci development guide also suggests to organize the badges on the README
in a table, when you have many badges, which, in my opinion, is the case here.
I think that you could remove: gargle, covr, roxygen2, tidyverse, usethis, webmockr.
tidytags
function help files: example_url <- "18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8"
tags_content <- read_tags(example_url)
pull_tweet_data(tags_content[1:10, ])
And I would add some comments to explain the different examples.
I don’t understand the definition of id_vector and n, and why pull_tweet_data(tags_content[1:10, ])
returns only 7 lines, although there is 10 different tweet IDs in id_str according to unique(tags_content[1:10, ]$id_str)
.
As id_vector is the parameter statuses in rtweet::lookup_statuses, it would maybe be better to inherit the parameter. At least, I would use
the same vocabulary, and notably talk about “statuses” (a Twitter status is a tweet, a retweet, a quote, or a reply).
tidytags
vignettes:For both vignettes, I would put more information in bold, because there is quite some text.
For each step, I would add an example with tidytags functions to test that the set up is correct (test API keys and test access to TAGS).
Thank you, @llrs and @marionlouveaux, for your careful and thorough reviews of tidytags. @jrosen48 and I will start working through your comments and suggestions. Bear with us, it seems like there's a good bit to tackle. Thank you though—we know this is going to make the package better.
Thank you @marionlouveaux for your in-depth review!
:wave: @bretsw @jrosen48! Any update? :smile_cat:
Hi @maelle, no update yet. I've turned the two reviews into a long checklist of items, but @jrosen48 and I have been trying to wrap up our semester responsibilities. We're meeting on Monday to take the next steps.
Great, thanks for the update!
:wave: @bretsw @jrosen48! Any update after your meeting?
Hi @maelle! We talked through all the comments, and we're aiming to have our revisions done by the end of next week (June 11).
For info I've applied a holding label at the authors' request. :slightly_smiling_face:
:wave: @bretsw @jrosen48! Any update? :smile_cat:
Hi @maelle! Thank you for checking in again. I think @jrosen48 and I are (finally) getting settled. We set a meeting on October 6 to start tackling the requested changes. Excited to get back to this.
Great to read, thank you for the update!
Date accepted: 2022-01-31 Submitting Author Name: Bret Staudt Willet Submitting Author Github Handle: !--author1-->@bretsw<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@jrosen48<!--end-author-others-- Repository: https://github.com/bretsw/tidytags Version submitted: 0.1.0 Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @llrs, @marionlouveaux
Due date for @llrs: 2021-04-19 Due date for @marionlouveaux: 2021-04-27Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
{tidytags} allows for both simple data collection and thorough data analysis. In short, {tidytags} first uses a Twitter Archiving Google Sheet (TAGS) to easily collect tweet ID numbers and then uses the R package {rtweet} to re-query the Twitter API to collect additional metadata. {tidytags} also introduces new functions developed to facilitate systematic yet flexible analyses of data from Twitter.
The target users for {tidytags} are social scientists (e.g., educational researchers) who have an interest in studying Twitter data but are relatively new to R, data science, or social network analysis. {tidytags} scaffolds tweet collection and analysis through a simple workflow that still allows for robust analyses.
{tidytags} wraps together functionality from several useful R packages, including {googlesheets4} to bring data from the TAGS tracker into R and {rtweet} for retrieving additional tweet metadata. The contribution of {tidytags} is to bring together the affordance of TAGS to easily collect tweets over time (which is not straightforward with {rtweet}) and the utility of {rtweet} for collecting additional data (which are missing from TAGS). Finally, {tidytags} reshapes data in preparation for geolocation and social network analyses that should be accessible to relatively new R users.
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
JOSS Options
- [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct