stephens999 / dscr

Dynamic statistical comparisons in R
16 stars 10 forks source link

Clean build #38

Closed ramanshah closed 9 years ago

ramanshah commented 9 years ago

This will require a fairly major effort; I'm gathering my thoughts for discussion/debate.

A goal, probably a prerequisite for wide release such as on CRAN, is for the GitHub repo to contain a clean specification of the package that is as non-repetitive as possible while remaining useful. The R tooling (e.g., devtools and roxygen2) causes a lot of duplication of information by compiling code from one place into code in other places. There's controversy about which duplication to check into git: some package developers, for example, go for maximum parsimony, excluding NAMESPACE or any documentation .Rd files because they can be built by roxygen2. Others (including Hadley) recommend keeping them in git so that devtools::install_github will ship documentation to users.

Currently, a lot of stuff is checked into this repo, and it has gotten out of sync in various ways. Many of the devtools verbs shred up the repository in surprising ways. For example, devtools::build_vignettes() deletes vignettes/dsc_shrink.html and puts a fresh vignette output in /inst/doc which is .gitignored. The package developer has to manually move files around to undo this.

In any case, we need to document and standardize a build process (it will likely consist of just a few devtools magic words that correspond to a known sequence of actions in RStudio) and automate enforcement that all duplicated/cached artifacts in the repo are fresh: that, for instance, all .Rmd and other roxygen2-generated files are consistent with the roxygen2 comments in the main codebase. I believe I can build Travis-CI tooling for this.

There are two ways to go in my mind, depending on priorities:

  1. If we are hoping to get dscr onto CRAN substantially as is, I could put my efforts into incrementally achieving a clean build for the project.
  2. If instead we are hoping to make major changes (e.g., rebuild a rather different dscr on top of BatchJobs for seamless parallelization) for a later release, it might be less work to start from the bottom, with a fresh package, and document the build process and all of its artifacts step by step. We'd graft the essential code into the new package piece by piece.
ramanshah commented 9 years ago

@stephens999 My main concern that remains about cleaning up the build, given our more modest plans in #49, is to hash out what is in the inst/ directory. I can't really tell from the code what came before when and if some of the stuff is scratch work, auto-generated, etc. This came up in the process of doing the snake_case migration: I am not sure of some of the stuff should be modified to snake_case, executed to create some new artifacts, and/or deleted.