tidytags: Simple Collection and Powerful Analysis of Twitter Data

bretsw commented 4 years ago

Date accepted: 2022-01-31 Submitting Author Name: Bret Staudt Willet Submitting Author Github Handle: !--author1-->@bretsw@jrosen48@maelle<!--end-editor-- Reviewers: @llrs, @marionlouveaux

Due date for @llrs: 2021-04-19 Due date for @marionlouveaux: 2021-04-27

Archive: TBD
Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: tidytags
Version: 0.1.0
Title: Simple Collection and Powerful Analysis of Twitter Data
Authors@R: c(
    person("K. Bret", "Staudt Willet", , 
      email = "bret@bretsw.com", role = c("aut", "cre"),
      comment = c(ORCID = "0000-0002-6984-416X")
    ),
    person("Joshua M.", "Rosenberg", ,
      role = c("aut"),
      comment = c(ORCID = "0000-0003-2170-0447")
    )
  )
Description: {tidytags} coordinates the simplicity of collecting tweets over time 
    with a [Twitter Archiving Google Sheet](https://tags.hawksey.info/) (TAGS) and the utility of the 
    [{rtweet} package](https://rtweet.info/) for processing and preparing additional Twitter metadata. 
    {tidytags} also introduces functions developed to facilitate systematic yet 
    flexible analyses of data from Twitter.
License: GPL-3
URL: https://bretsw.github.io/tidytags/, https://github.com/bretsw/tidytags
Depends: 
    R (>= 4.0)
Imports:
    dplyr (>= 0.8),
    googlesheets4 (>= 0.2),
    purrr (>= 0.3),
    readr (>= 1.3),
    rlang(>= 0.4),
    rtweet (>= 0.7),
    stringr (>= 1.4),
    tibble (>= 3.0), 
    tidyr (>= 1.0),
    tidyselect (>= 1.0)
Suggests:
    beepr,
    covr,
    ggplot2,
    knitr,
    longurl,
    mapsapi,
    mapview,
    rmarkdown,
    testthat,
    tidyverse,
    urltools,
    usethis
Encoding: UTF-8
VignetteBuilder: knitr
LazyData: TRUE
RoxygenNote: 7.1.0

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [x] data retrieval
- [ ] data extraction
- [x] data munging
- [ ] data deposition
- [ ] workflow automataion
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

{tidytags} allows for both simple data collection and thorough data analysis. In short, {tidytags} first uses a Twitter Archiving Google Sheet (TAGS) to easily collect tweet ID numbers and then uses the R package {rtweet} to re-query the Twitter API to collect additional metadata. {tidytags} also introduces new functions developed to facilitate systematic yet flexible analyses of data from Twitter.

Who is the target audience and what are scientific applications of this package?

The target users for {tidytags} are social scientists (e.g., educational researchers) who have an interest in studying Twitter data but are relatively new to R, data science, or social network analysis. {tidytags} scaffolds tweet collection and analysis through a simple workflow that still allows for robust analyses.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

{tidytags} wraps together functionality from several useful R packages, including {googlesheets4} to bring data from the TAGS tracker into R and {rtweet} for retrieving additional tweet metadata. The contribution of {tidytags} is to bring together the affordance of TAGS to easily collect tweets over time (which is not straightforward with {rtweet}) and the utility of {rtweet} for collecting additional data (which are missing from TAGS). Finally, {tidytags} reshapes data in preparation for geolocation and social network analyses that should be accessible to relatively new R users.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[x] Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Options

- [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)

[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

annakrystalli commented 4 years ago

Hello @bretsw and many thanks for your submission! We have discussed the package and have deemed it to be in scope.

@geanders will you be your handling editor 🙂

bretsw commented 4 years ago

Thank you, @annakrystalli! And thank you in advance, @geanders, for working with us. We are looking forward to your feedback.

bretsw commented 4 years ago

Hi @geanders, thank you so much for considering {tidytags} for review. @jrosen48 and I were just thinking about this and thought to reach out to inquire about the status of the review. We recognize that this is likely to be a very busy and hectic time and so understand if the answer is simply that it's 'in-process'. But would you be able to let us know? Thanks again for considering this. //Bret

maelle commented 3 years ago

:wave: @bretsw, some first remarks from me (another editor).

I am generally wondering how the reviewers can test the package. For instance is the TAGS used in the vignette usable by anyone? I see the vignette chunks aren't evaluated, why is it so? Not that I have anything against pre-compiling vignettes in such a case!
Could you add documentation on data protection? See https://devguide.ropensci.org/policies.html#ethics-data-privacy-and-human-subjects-research I saw a related open issue in your repo.
In the pkgdown reference page, it would make sense to group functions.
The README content could be partly re-used as a Get Started vignette (just name a vignette tidytags.Rmd). How to re-use chunks.
Relatedly, although the README structure is clear, I'd appreciate a setup checklist to provide an overview. Or maybe a setup vignette like the rtweet vignette about secrets.


* Have a TAGS with blablabla ready (blablabla being the public URL)
* Setup rtweet credentials if you want to use blablabla
* Setup Google geocoding credentials if you want to use blablabla

In the TAGS setup explanation maybe some screenshots would make sense. I say maybe as a) they easily go stale b) your text seems quite clear (not tested yet) and a screenshot wouldn't replace instructions.
Regarding tests am I correct that they're skipped everywhere but locally? Is it both because you are using authentication & Twitter data you don't own/can't share? I am actually working on the "HTTP testing in R" book so I'd recommend choosing an approach with some sort of fake data. I can help more once I know more about your constraints.

bretsw commented 3 years ago

@maelle, thank you for this initial feedback. @jrosen48 and I chatted today and divided up tasks. We're aiming to get back to you on these things by the end of next week (10/23).

maelle commented 3 years ago

Thanks for the update! :+1: Don't hesitate to ask any question here.

bretsw commented 3 years ago

@maelle, thank you again for your feedback! Below are responses from @jrosen48 and I, and we have pushed all changes to the repo: https://github.com/bretsw/tidytags

We're looking forward to more dialogue on this! //Bret

I am generally wondering how the reviewers can test the package. For instance is the TAGS used in the vignette usable by anyone? I see the vignette chunks aren't evaluated, why is it so? Not that I have anything against pre-compiling vignettes in such a case!

Our response: The example vignette is usable by anyone. We’ve just set the chunks as eval=FALSE because many of the processes are slow, even when limiting the examples to small data. But all the code should work and anyone can view/pull data from the TAGS tracker linked to in the vignette.

Could you add documentation on data protection? See https://devguide.ropensci.org/policies.html#ethics-data-privacy-and-human-subjects-research I saw a related open issue in your repo.

Our response: We added several paragraphs to a new “Considerations Related to Ethics, Data Privacy, and Human Subjects Research” in the README. Regarding the open issue in the repo (Issue #2), we are considering this enhancement for a future version of {tidytags}.

In the pkgdown reference page, it would make sense to group functions.

Our response: We have grouped the functions. See https://bretsw.github.io/tidytags/reference/index.html

The README content could be partly re-used as a Get Started vignette (just name a vignette tidytags.Rmd). How to re-use chunks.

Our response: We have created a “Getting started with tidytags” vignette (https://bretsw.github.io/tidytags/articles/setup.html).

Relatedly, although the README structure is clear, I'd appreciate a setup checklist to provide an overview. Or maybe a setup vignette like the rtweet vignette about secrets.

Our response: We have added duplicate checklists in both the README and in the new “Getting started with tidytags” vignette.

In the TAGS setup explanation maybe some screenshots would make sense. I say maybe as a) they easily go stale b) your text seems quite clear (not tested yet) and a screenshot wouldn't replace instructions.

Our response: We have added screenshots to the “Getting started with tidytags” vignette.

Regarding tests am I correct that they're skipped everywhere but locally? Is it both because you are using authentication & Twitter data you don't own/can't share? I am actually working on the "HTTP testing in R" book so I'd recommend choosing an approach with some sort of fake data. I can help more once I know more about your constraints.

Our response: We have set up tests for many of the functions, often using fake data. The functions that we cannot test for CRAN/Travis (i.e., we can only test locally) are those that are querying either the Twitter or Google Maps API. For example, add_users_data() uses a Twitter API key, so we cannot test all aspects of it for CRAN or Travis.

maelle commented 3 years ago

Thanks a lot to both of you, awesome work! I have a few comments already

I think it'd be good to have a process in place for re-computing the vignette. See for instance https://ropensci.org/technotes/2019/12/08/precompute-vignettes/
Regarding the ethics section of the README, it's very thoughtful! I'd prefer it to be in one of the vignettes too, because users don't usually access the README locally. We can hope they'll read the pkgdown website instead of the local docs, but who knows. Therefore if it were me I'd make the ethics guidance a re-usable chunk. https://www.garrickadenbuie.com/blog/dry-vignette-and-readme/
To make the editors checks I'd need a contributing guide, which reviewers will need too. E.g. I have no TAGS so do I create one (if so how) or is there one developers (and editors/reviewers) of the package can access as a sort of sandbox? (much easier if it's technically feasible) Examples
- https://docs.ropensci.org/osfr/CONTRIBUTING.html
- https://docs.ropensci.org/ruODK/CONTRIBUTING.html Once I know more, I'll give more precise suggestions of how to add tests for the API stuff. We require a test coverage of 75% before review and I'll help you get there as smoothly as possible!

Now I'll go create my credentials, that part I can already follow.

maelle commented 3 years ago

PS: I like the phrasing "Pain Point" :joy:

maelle commented 3 years ago

Is there no free tier for Google geocoding? If so then my opencage suggestion is becoming more important. :thinking:

maelle commented 3 years ago

well i guess the free credits one gets when creating the billing account count as free tier maybe :thinking:

maelle commented 3 years ago

A small note, in the rtweet docs for creating a token at the very end there is a step allowing you to check your token is available in a new session. What could a similar check be for the Google Geocoding API?

And even more ambitious and not necessary would be a sitrep function for tidytags, that you'd run to check Twitter and Google geocoding are now set up, and if not pointers to relevant docs. (à la devtools::dev_sitrep())

maelle commented 3 years ago

Last question before I log off, could the docs state what kind of restrictions can be applied to the Google geocoding API key for tidytags stuff to work?

maelle commented 3 years ago

:wave: @bretsw @jrosen48! Any update? (Acknowledging this is not an easy year/week)

bretsw commented 3 years ago

Hi @maelle! Thank you so much for checking in. I apologize for our delay. This week has been my big annual conference (AECT), and last week I was getting things ready, so I've been sidetracked. I give my last presentation tomorrow, so I'll respond to your comments and suggestions here next week. Thanks for bearing with us!

maelle commented 3 years ago

No problem, thanks for the update and good luck with your presentation :microphone: :slightly_smiling_face:

maelle commented 3 years ago

:wave: @bretsw @jrosen48! Have you had time to look at this again? Thank you!

bretsw commented 3 years ago

Hi @maelle, thank you again for your patience, and your helpful comments and suggestions. We have quite a few updates for your review. We list our response to your comment here:

I think it'd be good to have a process in place for pre-computing the vignette. See for instance https://ropensci.org/technotes/2019/12/08/precompute-vignettes/
- Our response: I have added the file tidytags-with-conf-hashtags.Rmd.orig for precomputing this vignette (which requires both Twitter API and OpenCage API keys, and takes a long time to compute). I also added the precompile.R file for a quick script for refreshing this vignette in the future.
Regarding the ethics section of the README, it's very thoughtful! I'd prefer it to be in one of the vignettes too, because users don't usually access the README locally. We can hope they'll read the pkgdown website instead of the local docs, but who knows. Therefore if it were me I'd make the ethics guidance a re-usable chunk. https://www.garrickadenbuie.com/blog/dry-vignette-and-readme/
- Our response: We have added the reusable fragment “ethics.Rmd” to both the README (where it was already) and the “tidytags-with-conf-hashtags.Rmd” vignette. We also created the reusable fragment “getting-help.Rmd” for both the README and the “setup.Rmd” vignette.
To make the editors checks I'd need a contributing guide, which reviewers will need too. E.g. I have no TAGS so do I create one (if so how) or is there one developers (and editors/reviewers) of the package can access as a sort of sandbox? (much easier if it's technically feasible). Examples: https://docs.ropensci.org/osfr/CONTRIBUTING.html, https://docs.ropensci.org/ruODK/CONTRIBUTING.html. Once I know more, I'll give more precise suggestions of how to add tests for the API stuff. We require a test coverage of 75% before review and I'll help you get there as smoothly as possible!

-Our response: We have added CODE_OF_CONDUCT.md and CONTRIBUTING.md
Outside of my role as editor, let me introduce you to opencage, where a lot of work by @dpprdan is happening in the dev branch these days. Not sure you even need an alternative to Google maps. 😉 Two advantages: 1. no "billing account" for free accounts. 2. use of open data so you're allowed to keep it. Is there no free tier for Google geocoding? If so then my opencage suggestion is becoming more important. 🤔 Well i guess the free credits one gets when creating the billing account count as free tier maybe 🤔 Sign up: https://opencagedata.com/pricing
- Our response: I have switched geocode_tags() to draw from OpenCage API instead of Google Maps API. The geocode_tags() documentation has been updated, and the setup.Rmd and tidytags-with-conf-hashtags.Rmd vignettes have been adjusted as well.
A small note, in the rtweet docs for creating a token at the very end there is a step allowing you to check your token is available in a new session. What could a similar check be for the Google Geocoding API? And even more ambitious and not necessary would be a sitrep function for tidytags, that you'd run to check Twitter and Google geocoding are now set up, and if not pointers to relevant docs. (à la devtools::dev_sitrep())
- Our response: Following the {rtweet} model, I have added a “Authorization in future R sessions” section to the setup.Rmd vignette for both the Twitter API key and the OpenCage API key.
Last question before I log off, could the docs state what kind of restrictions can be applied to the Google geocoding API key for tidytags stuff to work?
- Our response: I have added a paragraph in the setup.Rmd vignette that compares the price points between Google Maps API and OpenCage API. I conclude with “OpenCage does seem to be a bit flexible if the 2,500 queries per day are exceeded. However, if you greatly exceed this limit, they send a warning and ask you to upgrade to a paid plan.”

Outstanding issues:

I followed some examples for our Contributing Guide, but I don’t know if what I’ve written is sufficient.
I would love further guidance on increasing our test coverage, which is way below 75%.

Thank you again for working with us! @jrosen48 and I are looking forward to continuing to develop {tidytags}.

//Bret

maelle commented 3 years ago

Thanks a ton for all your work!

Regarding the code of conduct, awesome! Note that after acceptance the rOpenSci COC will apply cf https://devguide.ropensci.org/collaboration.html#coc-file
I am glad about OpenCage, not only because it's a nice package :grin: but also because setup is so much easier.
Speaking of setup docs, they are really good!
Now regarding the contributing guide what's missing is a sandbox TAGS like OSF's development environment/ODK sandbox. Could reviewers (and the editor?) get access to a TAGS used for testing the package? (even if ideally once I start looking for reviewers I hope to find TAGS users who'd use the package on their own data)
Speaking of testing. You will want both tests that use cached responses of some sort, and tests making real requests at least once in a while.
- The rOpenSci "HTTP testing in R" book is undergoing a major update (by me :slightly_smiling_face: ) in particular I've worked on demos of the three tools for HTTP testing there are in R. See https://github.com/ropensci-books/http-testing/pull/47 and the Netlify preview linked from the PR checks https://5fbb93389e79d3867339005d--http-testing-book.netlify.app/ (the Whole games section is the one with the demos)
- Regarding tests with real requests especially on CI the best guidance out there is https://gargle.r-lib.org/articles/articles/managing-tokens-securely.html
- If I get access to a sandbox TAGS I can help with setting up tests, e.g. making a PR adding a first vcr test if that helps.

bretsw commented 3 years ago

Hi @maelle! I've added an openly shared TAGS tracker to the CONTRIBUTING guide. See https://github.com/bretsw/tidytags/blob/master/CONTRIBUTING.md#prerequisites

I mention this in the guide, but the TAGS itself is read-only in the web, because the purpose of {tidytags} is to read the tracker archive into R and let you do all your analyses there.

I'll look more at the testing examples you've listed (THANK YOU!), but I wanted to quickly get you the TAGS sandbox first.

maelle commented 3 years ago

:wave: @bretsw! Sorry for the delay, thanks, having the TAGS tracker is awesome.

I've merged my PR to the HTTP testing in R book and added some advanced chapters. https://books.ropensci.org/http-testing/index.html

Do you have any specific questions regarding test setup, that I could look into?

Speaking of testthat, with testthat newest version context() is deprecated, what's used as context instead is the name of the test file (that hopefully reflects the name of the R file).

Also note that you don't need to load tidytags manually (library(tidytags)) in tests as it's loaded by testthat.

maelle commented 3 years ago

reg testthat's new version https://testthat.r-lib.org/articles/third-edition.html

bretsw commented 3 years ago

Hi @maelle! Thanks again for pointing me in all the right directions. I think I've gotten vcr tests working with Twitter and OpenCage APIs. At least it looks good on my local machine. I just pushed a big update so we'll see. When I ran covr::package_coverage() locally it looked like the tests are covering around 85% now. I've gotten tests for all functions, in any case.

It does look like R-CMD-check is failing on Github though.

One place where I got stuck and couldn't find my way through was on a vcr test for get_url_domain(). The vcr test worked with long URLS but not shortened ones (e.g., bit.ly). Seems like there's something going on in a dependency package that I'm not catching yet. Anyway, set those test to skip() for now.

Let me know what's next! THANK YOU, as always, for all your help.

maelle commented 3 years ago

Awesome, I'm going to have a look!

I see that the README links to an older version of the R packages book reg licencing. Here's the updated chapter https://r-pkgs.org/license.html

maelle commented 3 years ago

Looking at the dependencies of the package, why are there dependencies on covr, devtools, goodpractice, rcmdcheck, spelling, styler?

maelle commented 3 years ago

I see a security problem, I've opened an issue, please look at this urgently https://github.com/bretsw/tidytags/issues/30 @bretsw (at least inactivate the token that was leaked).

Feel free to tell me where in the docs we could have warned you more (this coming from someone who leaked their own GitHub PAT a few weeks ago :grimacing: )

maelle commented 3 years ago

(the link https://books.ropensci.org/http-testing/vcr-security.html#if-the-secret-is-in-a-request-header was updated two days ago, and the features described are also very new in vcr cc @sckott)

maelle commented 3 years ago

there's no setup for also running real requests once in a while, correct? We don't have requirements around HTTP testing yet (just the test coverage one) but real requests can be useful. For them you'd need an encrypted Twitter token for instance, which means more security-related work. https://books.ropensci.org/http-testing/real-requests-chapter.html & https://books.ropensci.org/http-testing/security-chapter.html

maelle commented 3 years ago

A few last comments for today

the current vignette building approach is fine, but for info with vcr::inject_cassette()/vcr::eject_cassette() (and the proper vcr security configuration) you could instead use vcr cassettes in the vignettes.
you might want to take some notes around your test setup in CONTRIBUTING.md, see https://books.ropensci.org/http-testing/contributor-friendliness.html
in the contributing guide what does "common sense" mean?
the checks in my typo fix PR failed, could you ensure tests pass for forks? I suspect it's because there's no fake Twitter httr token file around. Maybe something like below can help

httr::oauth2.0_token(
  endpoint = httr::oauth_endpoints("twitter"),
  app = httr::oauth_app("foobar", "foobar", "foobar"),
  credentials = list(access_token = "foobar")
)

I might hold off looking for reviewers until January (I'll be on vacation for two weeks, and I suspect it'll be hard to get responses). In any case you've done awesome work on the package, thank you.

maelle commented 3 years ago

(I see the checks don't pass for the repo either, hence the coverage badge still showing low coverage?)

maelle commented 3 years ago

One place where I got stuck and couldn't find my way through was on a vcr test for get_url_domain(). The vcr test worked with long URLS but not shortened ones (e.g., bit.ly). Seems like there's something going on in a dependency package that I'm not catching yet. Anyway, set those test to skip() for now.

This might be worth a question in vcr repo / in rOpenSci forum.

bretsw commented 3 years ago

One place where I got stuck and couldn't find my way through was on a vcr test for get_url_domain(). The vcr test worked with long URLS but not shortened ones (e.g., bit.ly). Seems like there's something going on in a dependency package that I'm not catching yet. Anyway, set those test to skip() for now.

This might be worth a question in vcr repo / in rOpenSci forum.

Opened an issue in the vcr repo: https://github.com/ropensci/vcr/issues/220

maelle commented 3 years ago

@ropensci-review-bot help

ropensci-review-bot commented 3 years ago

Hello @maelle, here are the things you can ask me to do:


# List all available commands
@ropensci-review-bot help

# Show our Code of Conduct
@ropensci-review-bot code of conduct

# Switch to "seeking reviewers"
@ropensci-review-bot seeking reviewers

# Approve the package
@ropensci-review-bot approve

# Add a user to this issue's reviewers list
@ropensci-review-bot add xxxxx to reviewers

# Remove a user from the reviewers list
@ropensci-review-bot remove xxxxx from reviewers

# Assign a user as the editor of this submission
@ropensci-review-bot assign @username as editor

# Remove the editor assigned to this submission
@ropensci-review-bot remove editor

# Close the issue
@ropensci-review-bot approve

# Close the issue
@ropensci-review-bot out of scope

bretsw commented 3 years ago

I've updated our codecov too and tidytags is at 88% coverage.

Also, I've made a number of updates trying to get CI tests to work. I updated geocode_tags() to reflect the new function from OpenCage, oc_forward_df() which does exactly what I had baked into my older code.

So, CI tests are currently passing on Mac and Ubuntu but the Windows test is returning a strange error: Error: Package suggested but not available: 'ggraph'

See https://github.com/bretsw/tidytags/runs/2013153747

Have you seen something like this before? The ggraph package definitely still exists and is usable on my local machine (and apparently on Mac and Ubuntu generally). Any ideas there?

maelle commented 3 years ago

Thanks for the update, this is awesome! (and I'd say a tribute to @dpprdan's new opencage functions)

Usually when I see such errors it happens right after a package update on CRAN, and so the binary is not available. It doesn't seem to be the case here.

I was thinking about suggesting you tweak the workflow so that remotes would error at the dependency installation step, not later (no need to do the check if not all dependencies are present) but then I remembered the tidyverse team seems to be switching their workflows to using pak instead: https://github.com/r-lib/pkgdown/blob/9c95f00d2505ddd83c4722e019202370715d3a3d/.github/workflows/R-CMD-check.yaml#L54

I think using pak instead of remotes might be a good idea and with a bit of luck you'd get no error / a more informative failure on Windows?

bretsw commented 3 years ago

Very interesting! I'll give pak a try and report back. Thanks @maelle!

bretsw commented 3 years ago

@maelle - Success! pak seemed to do the trick. tidytags is now passing CI tests for all OS platforms and showing 88% test coverage. I'll tackle real tests now, but this is a breakthrough moment!

maelle commented 3 years ago

Yay :tada:

bretsw commented 3 years ago

Hi @maelle, I'm working on setting up real tests for CI, and I wanted to run my plan by you to see what you think.

I've created a new file, "weekly-check.yaml" for the real tests. I've scheduled the tests to occur once every week at midnight Sundays with - cron: '0 0 * * Sun'.

tidytags functions want to find API keys from environmental variables, so I've stored opencage and rtweet keys in GitHub Secrets. I'll add them then with things like OPENCAGE_KEY: ${{ secrets.OPENCAGE_KEY }} and TWITTER_API_KEY: ${{ secrets.TWITTER_API_KEY }} (there are five secrets for Twitter, as listed at https://docs.ropensci.org/rtweet/articles/auth.html). rtweet will create a Twitter token with rtweet::create_token() so I'm adding that to the steps in the .yaml file. The token gets stored in (and then automatically accessed through) an .rds file. I've added "*.rds" to .gitignore so the token in the .rds file shouldn't be exposed at all.

I think that's it. What do you think? Is this a good plan? Am I missing something obvious that will unwittingly expose a secret?

maelle commented 3 years ago

You mean https://docs.ropensci.org/rtweet/articles/auth.html#2-access-token-secret-method? That sounds great and I should document this in the book as it sounds easier than encrypting.

As this thread taught us both how many ways there are to leak secrets 😅 (not your fault!): the token will be in an app dir or so, not in the current directory, correct? Just to make sure it won't get leaked in a check .tar.gz in case of failured.

Also, can you confirm you use a special Twitter account for the app and token, not your account?

bretsw commented 3 years ago

Hi, @maelle. I set up a special Twitter account for the app and token, and the real CI tests are queued to run in about 5 minutes. So, we'll see.

In the meantime, now the regular build has failed all CI tests: https://github.com/bretsw/tidytags/runs/2056401038

Is this because R updated to v4.1 on Friday 3/5? Any advice on how I can help tidytags catch up?

maelle commented 3 years ago

In some of the runs it seems the error comes from vcr. Could your tidytags example sheet have changed somehow?

Btw the link to that sheet contains a key https://github.com/bretsw/tidytags/runs/2056400999#step:10:142 Is this expected?

maelle commented 3 years ago

(the real tests might be different tests, or at least some of the tests might be skipped when vcr is off, depending on what the testthat expectations are)

bretsw commented 3 years ago

In some of the runs it seems the error comes from vcr. Could your tidytags example sheet have changed somehow?

Btw the link to that sheet contains a key https://github.com/bretsw/tidytags/runs/2056400999#step:10:142 Is this expected?

Nope, this was not expected. That was my Google API key, which is now deleted. I need to dig into this, because the Google API key was only used back when tidytags used Google Maps for geocoding. I would not have expected that key to be called at all since we switched to OpenCage.

maelle commented 3 years ago

maybe it's needed for the raw URL to the TAGS?

bretsw commented 3 years ago

I just re-recorded all the vcr cassettes after deleting that Google API key. All tests passed locally. CI tests just failed again with same error as before, with same old key in the raw URL. I'll keep digging, but I'm confused so far.

maelle commented 3 years ago

It seems a key is needed to read the sheets that don't require a token. https://github.com/tidyverse/googlesheets4/blob/8cd9a75ba17d064a76b5a8bf0b8c9dbfa91f2907/R/request_generate.R#L23

So you need to add this key to the vcr secret filtering (and to provide it for real requests). I do not know where the key in the error log comes from.

Now as to why the request is failing, the request does not look different to me in your cassette https://sheets.googleapis.com/v4/spreadsheets/18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8?fields=spreadsheetId%2Cproperties%2CspreadsheetUrl%2Csheets.properties%2CnamedRanges&key=AIzaSyDKRsnYs5G4c8y4BMlXLKTKMTheNXrsNEM vs what's done on GHA https://sheets.googleapis.com/v4/spreadsheets/18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8?fields=spreadsheetId%2Cproperties%2CspreadsheetUrl%2Csheets.properties%2CnamedRanges&key=AIzaSyDKRsnYs5G4c8y4BMlXLKTKMTheNXrsNEM so that's worth opening a vcr issue.

I hope this key has been invalidated btw.

maelle commented 3 years ago

Also note, regarding API keys that are used in the query parts of URLs, that vcr now lets you more specifically filter them cf https://docs.ropensci.org/vcr/articles/configuration.html#filter-query-parameters

maelle commented 3 years ago

so in your package (and that'll be precious information for users / contributors) the secrets are the Twitter token, the OpenCage API key but also some sort of googlesheets4 authentication, be it a token or a key.

ropensci / software-review