general tidy up - Githubissues

jamieRowen commented 2 years ago

I don't want to step on toes, but I have been using this package a fair amount recently. I am submitting a PR for general tidy up of the package source as well as some actions. This PR gets the package to the point that it passes the standard R CMD check on the major operating systems again I notice the actions that you have defined have not passed for some time, I have added the "check-standard" set for the major operating systems. I have also restyled the source to make it consistent across the package as well as add a lintr + action to keep code "in style" going forward.

Without a full suite of tests it is hard to ensure that I haven't broken anything with data masking .data$variable in the various tidyverse pipelines so would advise checking that everything is ok before merging.

A summary of the changes is:

apply a style to get consistent style across src
add lintr to repository to enforce consistent style going forward
add action to check lintr
add action to build package docs
add check-standard action
import rlang .data for proper masking of data variables in tidyverse pipelines
add missing parameter definitions to docs
add action to build pkgdown webpage from docs and deploy to github pages

mattroumaya commented 2 years ago

Hi @jamieRowen - wow, lots of changes here! I will review this during the week and want to make sure I make a decent amount of time to review, test, and understand everything, as I also use this package on a daily or weekly basis.

I didn't create this package, so would welcome any input @sfirke has, even though he has stepped away from this project.

If you could speak on the benefits of some of these changes I would appreciate it, so I have a better understanding of the requested changes. Specifically, I understand the value of running a linter, but I've never used one for my projects. Besides standardizing the look of the code, is there another reason to use lintr?

I have a rudimentary understanding of the use of .data from rlang, but what is the reason for implementing it in this package? My concern is that implementing all of these changes to standardize code actually makes it harder for me to interpret, as I'm used to reading and writing code the way that it was originally written, and is sometimes hard to follow after the linter is run.

Apologies for my ignorance here - I want to make sure that we're not merging anything that adds complexity and risks breaking something for the benefit of standardization.

jamieRowen commented 2 years ago

Specifically, I understand the value of running a linter, but I've never used one for my projects. Besides standardizing the look of the code, is there another reason to use lintr?

Sure. A linter is a static analysis tool and cheap check of code quality against a set of rules that it knows about. As well as stylistic errors it can check for programmatic and syntax errors too. Some examples in R and lintr specifically are:

check for == NA which should be is.na()
check that no absolute file paths are used which would be non transferable to another machine
check for missing packages and symbols on namespace calls ::
check that no undesirable operators are used like <<-
make sure that TRUE and FALSE are not used as T and F which can cause unexpected and difficult to find bugs

The other element that is quite nice is that it will estimate complexity of expressions and flag an issue if something gets too complex. This helps to encourage smaller, more concise, typically more easily testable functions.

Adding a lint action also encourages other developers (perhaps additional external people like myself) to add code in a way that you are happy with.

You don't have to stick to the linters that I have chosen, I just picked the most common set that we use at work.

I have a rudimentary understanding of the use of .data from rlang, but what is the reason for implementing it in this package?

The .data pronoun is specifically referring to a symbol in the data set, rather than one in the enclosing environment. (There is a corresponding .env pronoun). 2 real reasons to use it here

Without it (or something worse) you won't pass R CMD check, so your package check pipeline will never pass. If it never passes it isn't a very reliable indicator of something changing.
Clarity over whether the author intends to be referring to data in a data.frame/tibble or something outside of.

small example:

speed = 1:5
ex = cars[1:5, ]
"speed" %in% colnames(cars) # TRUE
ex %>% 
  dplyr::mutate(new_speed = speed * 10)
  # does the mutate mean create new_speed column from the existing speed column
  # or the external variable speed?

ex %>%
  dplyr::mutate(new_speed = .data$speed * 10)
  # clarity that I intend to manipulate existing column

ex %>%
  dplyr::mutate(new_speed = .env$speed * 10)
  # clarity that i intend to use the speed variable in
  # the enclosing environment

As for the other items:

add action to build package docs
- personally find that exploring documentation of a package I want to use is much easier via a web browser than the R interface.
- pkgdown automates building the website from existing package docs so is basically free
- pkgdown sites are becoming really popular for packages
add check-standard action
- checks against current, devel and old release of R on 3 major operating systems which seems reasonable
- added this because your existing checks have not passed for some time and therefore are not very informative in telling you that something someone has added has caused a breaking change.
add missing parameter definitions to docs
- hopefully self explanatory but there was an undocumented argument to one of your functions

sfirke commented 2 years ago

Thanks @jamieRowen for the contributions and @mattroumaya for asking good questions. This looks good to me. Thank you for keeping actual code changes separate from a pure formatting change, that made this much easier to review. Looks good to me, with one trivial note for a linter formatting result I didn't love.

A pkgdown site is a good idea.

I don't know where we're at with passing check. I agree it would be good to get there and appreciate you setting up Github actions for it. It looks like this PR is failing checks on some platforms, at a glance it looks like the fault of the check code + software and not any errors in the package. If that's the case I guess we roll with it and hope it irons itself out?

sfirke commented 2 years ago

@jamieRowen FYI on the history here: Thomas Leeper wrote the foundational version of this package in the first place but for a different version of the API. I updated it to be tidyverse-style and work on v3 of the API. Now Matt is maintaining it as I no longer use SurveyMonkey and changed employers. He has jumped admirably into some of the hairiest internals but some stuff will be less familiar to him if I wrote it solo back in the day, e.g., tests.

jamieRowen commented 2 years ago

Thanks @jamieRowen for the contributions and @mattroumaya for asking good questions. This looks good to me. Thank you for keeping actual code changes separate from a pure formatting change, that made this much easier to review. Looks good to me, with one trivial note for a linter formatting result I didn't love.

A pkgdown site is a good idea.

I don't know where we're at with passing check. I agree it would be good to get there and appreciate you setting up Github actions for it. It looks like this PR is failing checks on some platforms, at a glance it looks like the fault of the check code + software and not any errors in the package. If that's the case I guess we roll with it and hope it irons itself out?

I think the failing checks are the result of the actions that you already had, rather than the ones that I added. Once this was accepted I was going to raise an issue to remove the broken checks.

jamieRowen commented 2 years ago

@jamieRowen FYI on the history here: Thomas Leeper wrote the foundational version of this package in the first place but for a different version of the API. I updated it to be tidyverse-style and work on v3 of the API. Now Matt is maintaining it as I no longer use SurveyMonkey and changed employers. He has jumped admirably into some of the hairiest internals but some stuff will be less familiar to him if I wrote it solo back in the day, e.g., tests.

No worries. I am happy to do things like refactoring and adding tests etc when I get time as I have some projects that depend on data pipelines where one of the data sources is survey monkey so I am keen to help with keeping it running when something breaks or adding features that I come across.

mattroumaya commented 2 years ago

Thanks @jamieRowen for the contributions and @sfirke for the comments and guidance. I'm hoping to have some time to review #92 today, but for now seems like it's safe to merge #91 and keep the conversation going.

mattroumaya / surveymonkey

general tidy up #91