ropensci / rrrpkg

Use of an R package to facilitate reproducible research
255 stars 24 forks source link

A link to a sample compendium would be useful #3

Open tmalsburg opened 9 years ago

benmarwick commented 9 years ago

Good idea, we could add these, do you know of others?

tmalsburg commented 9 years ago

Looks good. In addition, it might make sense to have a dummy repository that illustrates the structure but does not contains other irrelevant material. rrrpkg itself could be used for that.

cboettig commented 9 years ago

We should probably link the original compendium example from Robert Gentleman: http://dx.doi.org/10.2202/1544-6115.1034 and the original Compendium paper: http://biostats.bepress.com/bioconductor/paper2/ (even though these are somewhat older). There certainly are other examples from other folks (I have a few others as well but variety of authors and styles is probably best); I think there must be some stuff in J Biostatistics. (of course not counting things like JSS papers).

I loosely maintain a template like that for my own use: https://github.com/cboettig/template but I'm not sure that it is a good idea or not for this. devtools and other R tools already support creating package skeletons really quickly, with good templates included. I worry that adding a template here could both become dated quickly and more importantly, might look overkill for the minimum we're trying to suggest here.

I do think we need some examples that are much lighter-weight -- e.g. things that don't pass R CMD check and have all the bells and whistles. I wonder if it might be worth adapting some existing paper that just provides some data files and some script files so that it looks like an R package. e.g. something like: https://github.com/duffymeg/BroodParasiteDescription (see the author's blog post on this too, which is also relevant to this discussion: https://dynamicecology.wordpress.com/2015/05/28/my-first-experience-with-github-for-sharing-data-and-code/comment-page-1/). e.g. just dump the R scripts into R/, the data into data/, fix some file path issues and add a minimal DESCRIPTION file.

On Tue, Jun 2, 2015 at 12:26 PM Titus von der Malsburg < notifications@github.com> wrote:

Looks good. In addition, it might make sense to have a dummy repository that illustrates the structure but does not contains other irrelevant material. rrrpkg itself could be used for that.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/rrrpkg/issues/3#issuecomment-108066375.

gmbecker commented 9 years ago

Carl,

I Would argue that R scripts (as opposed to R functions/software) don't belong in the R/ directory of a compendium. Internally at genentech, our spec calls for a separate analysis/ directory which prevents them from being run during install/build, but bundles them with any included data or functions. It provides an (albiet loose) demarcation between the software (functions) and the analysis code (scripts).

If this were adopted, tooling around it to run scripts from an analysis package would be pretty straightforward to develop, I think.

~G

On Tue, Jun 2, 2015 at 12:43 PM, Carl Boettiger notifications@github.com wrote:

We should probably link the original compendium example from Robert Gentleman: http://dx.doi.org/10.2202/1544-6115.1034 and the original Compendium paper: http://biostats.bepress.com/bioconductor/paper2/ (even though these are somewhat older). There certainly are other examples from other folks (I have a few others as well but variety of authors and styles is probably best); I think there must be some stuff in J Biostatistics. (of course not counting things like JSS papers).

I loosely maintain a template like that for my own use: https://github.com/cboettig/template but I'm not sure that it is a good idea or not for this. devtools and other R tools already support creating package skeletons really quickly, with good templates included. I worry that adding a template here could both become dated quickly and more importantly, might look overkill for the minimum we're trying to suggest here.

I do think we need some examples that are much lighter-weight -- e.g. things that don't pass R CMD check and have all the bells and whistles. I wonder if it might be worth adapting some existing paper that just provides some data files and some script files so that it looks like an R package. e.g. something like: https://github.com/duffymeg/BroodParasiteDescription (see the author's blog post on this too, which is also relevant to this discussion:

https://dynamicecology.wordpress.com/2015/05/28/my-first-experience-with-github-for-sharing-data-and-code/comment-page-1/ ). e.g. just dump the R scripts into R/, the data into data/, fix some file path issues and add a minimal DESCRIPTION file.

On Tue, Jun 2, 2015 at 12:26 PM Titus von der Malsburg < notifications@github.com> wrote:

Looks good. In addition, it might make sense to have a dummy repository that illustrates the structure but does not contains other irrelevant material. rrrpkg itself could be used for that.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/rrrpkg/issues/3#issuecomment-108066375.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/rrrpkg/issues/3#issuecomment-108073994.

Gabriel Becker, PhD Computational Biologist Bioinformatics and Computational Biology Genentech, Inc.

cboettig commented 9 years ago

Ah right, I think that's what's in the rrrpkg readme as well -- analysis would be better. (one might call it code or scripts but it does seem like there is momentum behind analysis, and that is nicely more relaxed term should things that are not strictly scripts be placed in there (e.g. Rmd files). Good call.

On Tue, Jun 2, 2015 at 12:56 PM Gabe Becker notifications@github.com wrote:

Carl,

I Would argue that R scripts (as opposed to R functions/software) don't belong in the R/ directory of a compendium. Internally at genentech, our spec calls for a separate analysis/ directory which prevents them from being run during install/build, but bundles them with any included data or functions. It provides an (albiet loose) demarcation between the software (functions) and the analysis code (scripts).

If this were adopted, tooling around it to run scripts from an analysis package would be pretty straightforward to develop, I think.

~G

On Tue, Jun 2, 2015 at 12:43 PM, Carl Boettiger notifications@github.com wrote:

We should probably link the original compendium example from Robert Gentleman: http://dx.doi.org/10.2202/1544-6115.1034 and the original Compendium paper: http://biostats.bepress.com/bioconductor/paper2/ (even though these are somewhat older). There certainly are other examples from other folks (I have a few others as well but variety of authors and styles is probably best); I think there must be some stuff in J Biostatistics. (of course not counting things like JSS papers).

I loosely maintain a template like that for my own use: https://github.com/cboettig/template but I'm not sure that it is a good idea or not for this. devtools and other R tools already support creating package skeletons really quickly, with good templates included. I worry that adding a template here could both become dated quickly and more importantly, might look overkill for the minimum we're trying to suggest here.

I do think we need some examples that are much lighter-weight -- e.g. things that don't pass R CMD check and have all the bells and whistles. I wonder if it might be worth adapting some existing paper that just provides some data files and some script files so that it looks like an R package. e.g. something like: https://github.com/duffymeg/BroodParasiteDescription (see the author's blog post on this too, which is also relevant to this discussion:

https://dynamicecology.wordpress.com/2015/05/28/my-first-experience-with-github-for-sharing-data-and-code/comment-page-1/ ). e.g. just dump the R scripts into R/, the data into data/, fix some file path issues and add a minimal DESCRIPTION file.

On Tue, Jun 2, 2015 at 12:26 PM Titus von der Malsburg < notifications@github.com> wrote:

Looks good. In addition, it might make sense to have a dummy repository that illustrates the structure but does not contains other irrelevant material. rrrpkg itself could be used for that.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/rrrpkg/issues/3#issuecomment-108066375.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/rrrpkg/issues/3#issuecomment-108073994.

Gabriel Becker, PhD Computational Biologist Bioinformatics and Computational Biology Genentech, Inc.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/rrrpkg/issues/3#issuecomment-108078033.

cboettig commented 9 years ago

Okay, how's this for a more minimal example: https://github.com/cboettig/BroodParasiteDescription

I've tried to make the bare minimum number of changes to https://github.com/duffymeg/BroodParasiteDescription (see https://dynamicecology.wordpress.com/2015/05/28/my-first-experience-with-github-for-sharing-data-and-code/comment-page-1/, I think this is a simple and realistic example) to make it an R package format.

Let me know if anyone has feedback on these changes; if it looks like what we're going for, or either needs more (or fewer?) modifications to be realistic & useful. If we think this is good then maybe it's worth making a PR to Meg with these changes, so that we can link her original repo.

tmalsburg commented 9 years ago

@cboettig This example is very useful but it doesn't have the directories R, manuscript, and vignettes. It would be good it everything that is covered by the proposal was part of the "minimal" example.

cboettig commented 9 years ago

@tmalsburg thanks. I'm not sure that those things should be included in the definition of "minimal" -- that project didn't need any user-defined functions, so no R directory. We already have the examples that @benmarwick mentioned which include all of those directories.

Perhaps something more intermediate would still be nice as well (e.g. has R/, maybe manuscript to show a .Rmd example (with pandoc->word as the output format?!) but not all the extra stuff like Docker and travis that are in the other two examples Ben mentioned.

benmarwick commented 9 years ago

That's very interesting, your rearrangement of BroodParasiteDescriptionmost is the most minimal R package I've ever seen! And I can install it just fine, though building it give a few notes and warnings, but that's fine. If you make a PR to the original authors, I'll make a PR to this readme to add some more detail according to the discussion on this thread, and link to some examples (I'll link to your repo for now, and update it if your PR is accepted)

jennybc commented 9 years ago

Thanks @cboettig I think that's a very useful contribution. An example that shows just how thin the "R package layer" can be is very valuable!

jhollist commented 9 years ago

@jennybc per your request!

Another example: Modeling Lake Trophic State. I'm happy to add and submit PR, but wasn't exactly sure where to add. This example is kind of in between the intermediate and complex example. It also is pretty real-world as the nice clean initial set up got a bit messy with most code in functions, but a lot also embedded in the Rmd.

benmarwick commented 9 years ago

@jhollist I think that would be a great example of an intermediate example, please do add a mention of it with a PR!

jennybc commented 9 years ago

@jhollist's example added to README in f83ca4acffc72ddfbcb76bc55d8e88725ec2529f