mattroumaya / surveymonkey

Access your SurveyMonkey data directly from R!
https://mattroumaya.github.io/surveymonkey/
Other
42 stars 10 forks source link

general tidy up #91

Closed jamieRowen closed 2 years ago

jamieRowen commented 2 years ago

I don't want to step on toes, but I have been using this package a fair amount recently. I am submitting a PR for general tidy up of the package source as well as some actions. This PR gets the package to the point that it passes the standard R CMD check on the major operating systems again I notice the actions that you have defined have not passed for some time, I have added the "check-standard" set for the major operating systems. I have also restyled the source to make it consistent across the package as well as add a lintr + action to keep code "in style" going forward.

Without a full suite of tests it is hard to ensure that I haven't broken anything with data masking .data$variable in the various tidyverse pipelines so would advise checking that everything is ok before merging.

A summary of the changes is:

mattroumaya commented 2 years ago

Hi @jamieRowen - wow, lots of changes here! I will review this during the week and want to make sure I make a decent amount of time to review, test, and understand everything, as I also use this package on a daily or weekly basis.

I didn't create this package, so would welcome any input @sfirke has, even though he has stepped away from this project.

If you could speak on the benefits of some of these changes I would appreciate it, so I have a better understanding of the requested changes. Specifically, I understand the value of running a linter, but I've never used one for my projects. Besides standardizing the look of the code, is there another reason to use lintr?

I have a rudimentary understanding of the use of .data from rlang, but what is the reason for implementing it in this package? My concern is that implementing all of these changes to standardize code actually makes it harder for me to interpret, as I'm used to reading and writing code the way that it was originally written, and is sometimes hard to follow after the linter is run.

Apologies for my ignorance here - I want to make sure that we're not merging anything that adds complexity and risks breaking something for the benefit of standardization.

jamieRowen commented 2 years ago

Specifically, I understand the value of running a linter, but I've never used one for my projects. Besides standardizing the look of the code, is there another reason to use lintr?

Sure. A linter is a static analysis tool and cheap check of code quality against a set of rules that it knows about. As well as stylistic errors it can check for programmatic and syntax errors too. Some examples in R and lintr specifically are:

The other element that is quite nice is that it will estimate complexity of expressions and flag an issue if something gets too complex. This helps to encourage smaller, more concise, typically more easily testable functions.

Adding a lint action also encourages other developers (perhaps additional external people like myself) to add code in a way that you are happy with.

You don't have to stick to the linters that I have chosen, I just picked the most common set that we use at work.

I have a rudimentary understanding of the use of .data from rlang, but what is the reason for implementing it in this package?

The .data pronoun is specifically referring to a symbol in the data set, rather than one in the enclosing environment. (There is a corresponding .env pronoun). 2 real reasons to use it here

  1. Without it (or something worse) you won't pass R CMD check, so your package check pipeline will never pass. If it never passes it isn't a very reliable indicator of something changing.
  2. Clarity over whether the author intends to be referring to data in a data.frame/tibble or something outside of.

small example:

speed = 1:5
ex = cars[1:5, ]
"speed" %in% colnames(cars) # TRUE
ex %>% 
  dplyr::mutate(new_speed = speed * 10)
  # does the mutate mean create new_speed column from the existing speed column
  # or the external variable speed?

ex %>%
  dplyr::mutate(new_speed = .data$speed * 10)
  # clarity that I intend to manipulate existing column

ex %>%
  dplyr::mutate(new_speed = .env$speed * 10)
  # clarity that i intend to use the speed variable in
  # the enclosing environment

As for the other items:

sfirke commented 2 years ago

Thanks @jamieRowen for the contributions and @mattroumaya for asking good questions. This looks good to me. Thank you for keeping actual code changes separate from a pure formatting change, that made this much easier to review. Looks good to me, with one trivial note for a linter formatting result I didn't love.

A pkgdown site is a good idea.

I don't know where we're at with passing check. I agree it would be good to get there and appreciate you setting up Github actions for it. It looks like this PR is failing checks on some platforms, at a glance it looks like the fault of the check code + software and not any errors in the package. If that's the case I guess we roll with it and hope it irons itself out?

sfirke commented 2 years ago

@jamieRowen FYI on the history here: Thomas Leeper wrote the foundational version of this package in the first place but for a different version of the API. I updated it to be tidyverse-style and work on v3 of the API. Now Matt is maintaining it as I no longer use SurveyMonkey and changed employers. He has jumped admirably into some of the hairiest internals but some stuff will be less familiar to him if I wrote it solo back in the day, e.g., tests.

jamieRowen commented 2 years ago

Thanks @jamieRowen for the contributions and @mattroumaya for asking good questions. This looks good to me. Thank you for keeping actual code changes separate from a pure formatting change, that made this much easier to review. Looks good to me, with one trivial note for a linter formatting result I didn't love.

A pkgdown site is a good idea.

I don't know where we're at with passing check. I agree it would be good to get there and appreciate you setting up Github actions for it. It looks like this PR is failing checks on some platforms, at a glance it looks like the fault of the check code + software and not any errors in the package. If that's the case I guess we roll with it and hope it irons itself out?

I think the failing checks are the result of the actions that you already had, rather than the ones that I added. Once this was accepted I was going to raise an issue to remove the broken checks.

jamieRowen commented 2 years ago

@jamieRowen FYI on the history here: Thomas Leeper wrote the foundational version of this package in the first place but for a different version of the API. I updated it to be tidyverse-style and work on v3 of the API. Now Matt is maintaining it as I no longer use SurveyMonkey and changed employers. He has jumped admirably into some of the hairiest internals but some stuff will be less familiar to him if I wrote it solo back in the day, e.g., tests.

No worries. I am happy to do things like refactoring and adding tests etc when I get time as I have some projects that depend on data pipelines where one of the data sources is survey monkey so I am keen to help with keeping it running when something breaks or adding features that I come across.

mattroumaya commented 2 years ago

Thanks @jamieRowen for the contributions and @sfirke for the comments and guidance. I'm hoping to have some time to review #92 today, but for now seems like it's safe to merge #91 and keep the conversation going.