A few suggestions from a quick review

This was really interesting to have a look at, thanks for sharing it. I was able to reproduce everything with minimal edits (changing my wd, commenting out some of the interactive bits, View, etc). I'll email my output since I'm not sure how to get at html file on here quickly, I've also found an easy place to put it online: http://rpubs.com/benmarwick/zenodo_13201 A few quick thoughts...

On the code...

The scripts could be prefixed with a number like 001-xxxx.R to indicate what the correct order should be
The entire ms. could be a R markdown file that sources the R scripts (I see you explored this with e temp plots, see below for another approach)
Most current style guides for R recommend spaces around operators to improve readability (=, +, etc)
Try dplyr instead of plyr, it feels a million times faster and has more intuitive syntax
Add package version numbers to the ms. and readme.md, I usually just paste in the output from sessionInfo
add a clickable DOI to the preprint into the readme.md to close the loop between paper and code. I see you've linked the title, but a DOI is more recognisable as a citable thing. Related, I'd specifically say "Cite as:..." before the citation, so people can see easy see how to give you credit for these things

On the figures...

Be more explicit about what code generates which figure, since the code generates more figs than are in the ms.
looks like fig 4 has had some post-processing to get labels on the plots, this could be done with R, like you did for fig 5
the code for figs 5-6-7 makes the plots appear in the reverse order to how they appear in the paper, which had me puzzled for a little while :)
looks like some post-processing on figs 8-9-10 to get the secondary y-axis title. But I'm not even sure if that's possible without a great deal of fiddling. I'm amazed you even got a second y-axis on there!

On the stats...

all the chi-sq tests match my output
I couldn't find any of the metric data or stats (SE, SD, CI) in the output. Can they be included in the scripts?

Below is a more self-contained approach, using an R Markdown file that sources the scripts (the scripts could simply be copy and pasted into the chunks). You wouldn't need to setwd for each script because the knitting process sets the wd to the location of the Rmd file (which is great for isolating the code execution from contaminating objects, etc).

This Rmd file works for me with your scripts with minor edits (I had to comment out a few lines here and there), and I can knit it to get all your figs at once. To make that possible, I'd have

a chuck at the top that sets all the chunk options so that echo=FALSE to suppress the code, cache=TRUE to keep things fast while iterating over it, and a few other tweaks to keep it tidy
a chunk that gets the wd to be one dir up so we can find the data files, In this case I changed my wd in each script file because I ran the scripts separately before making this Rmd file
a chunk to get numbered fig captions and cross-refs in the text.\
a chunk to check to see if the required packages were installed, and if not, get them without asking. I use this https://gist.github.com/benmarwick/5054846

        ---
    date: "Tuesday, December 16, 2014"
    output: html_document
    ---

    The narrative text can go in here and between the code chunks. Then you can knit the paper to PDF (or use `make`) and you've got a (more) easily reproducible paper.

    ```{r fig_2_and_3}
    source("temperature-plots.R", print.eval = TRUE, echo = TRUE)

```{r fig_4}
source("Kaplan-meier-stats-plot-all.R", print.eval = TRUE, echo = TRUE)
```

```{r figs_5_6_7}
source("sizedist-stats-plot.R", print.eval = TRUE, echo = TRUE)
```

```{r fig_8}
source("percbrood-temp-plot-OysterBay.R", print.eval = TRUE, echo = TRUE)
```

```{r fig_9}
source("percbrood-temp-plot-Fidalgo.R", print.eval = TRUE,echo = TRUE)
```

```{r fig_10}
source("percbrood-temp-plot-Manchester.R", print.eval = TRUE, echo = TRUE)
```

Dependencies within R:

```{r}
# show package names and version numbers (not given in the paper)
sessionInfo()
```

---

@benmarwick - thanks so much for taking the time to look at this. These suggestions are great!

@jheare - I made copy of this in wiki - https://github.com/jheare/OluridaSurvey2014/wiki/Marwick-Suggestions

so we can go through one-by-one

Thanks Ben!

This is my first repository release so this is all new and exciting for me. I've never used dplyr before but now that you mention it I will definitely need to look into it. Your comments have been added to our research groups SOP for code and will eventually (after christmas) be added to the existing repositories.

Thanks again, Jake

On Tue, Dec 16, 2014 at 5:28 PM, Ben Marwick notifications@github.com wrote:

This was really interesting to have a look at, thanks for sharing it. I was able to reproduce everything with minimal edits (changing my wd, commenting out some of the interactive bits, View, etc). I'll email my output since I'm not sure how to get at html file on here quickly, I've also found an easy place to put it online: http://rpubs.com/benmarwick/zenodo_13201 A few quick thoughts...

On the code...

The scripts could be prefixed with a number like 001-xxxx.R to indicate what the correct order should be

The entire ms. could be a R markdown file that sources the R scripts (I see you explored this with e temp plots, see below for another approach)

Most current style guides for R recommend spaces around operators to improve readability (=, +, etc)

Try dplyr instead of plyr, it feels a million times faster and has more intuitive syntax

Add package version numbers to the ms. and readme.md, I usually just paste in the output from sessionInfo

add a clickable DOI to the preprint into the readme.md to close the loop between paper and code. I see you've linked the title, but a DOI is more recognisable as a citable thing. Related, I'd specifically say "Cite as:..." before the citation, so people can see easy see how to give you credit for these things

On the figures...

Be more explicit about what code generates which figure, since the code generates more figs than are in the ms.

looks like fig 4 has had some post-processing to get labels on the plots, this could be done with R, like you did for fig 5

the code for figs 5-6-7 makes the plots appear in the reverse order to how they appear in the paper, which had me puzzled for a little while :)

looks like some post-processing on figs 8-9-10 to get the secondary y-axis title. But I'm not even sure if that's possible without a great deal of fiddling. I'm amazed you even got a second y-axis on there!

On the stats...

all the chi-sq tests match my output

I couldn't find any of the metric data or stats (SE, SD, CI) in the output. Can they be included in the scripts?

Below is a more self-contained approach, using an R Markdown file that sources the scripts (the scripts could simply be copy and pasted into the chunks). You wouldn't need to setwd for each script because the knitting process sets the wd to the location of the Rmd file (which is great for isolating the code execution from contaminating objects, etc).

This Rmd file works for me with your scripts with minor edits (I had to comment out a few lines here and there), and I can knit it to get all your figs at once. To make that possible, I'd have

a chuck at the top that sets all the chunk options so that echo=FALSE to suppress the code, cache=TRUE to keep things fast while iterating over it, and a few other tweaks to keep it tidy

a chunk that gets the wd to be one dir up so we can find the data files, In this case I changed my wd in each script file because I ran the scripts separately before making this Rmd file

a chunk to get numbered fig captions and cross-refs in the text.\

a chunk to check to see if the required packages were installed, and if not, get them without asking. I use this https://gist.github.com/benmarwick/5054846
    ---
date: "Tuesday, December 16, 2014"
output: html_document
---

The narrative text can go in here and between the code chunks. Then you can knit the paper to PDF (or use `make`) and you've got a (more) easily reproducible paper.

```{r fig_2_and_3}
source("temperature-plots.R", print.eval = TRUE, echo = TRUE)
source("Kaplan-meier-stats-plot-all.R", print.eval = TRUE, echo = TRUE)
source("sizedist-stats-plot.R", print.eval = TRUE, echo = TRUE)
source("percbrood-temp-plot-OysterBay.R", print.eval = TRUE, echo = TRUE)
source("percbrood-temp-plot-Fidalgo.R", print.eval = TRUE,echo = TRUE)
source("percbrood-temp-plot-Manchester.R", print.eval = TRUE, echo = TRUE)
Dependencies within R:
# show package names and version numbers (not given in the paper)
sessionInfo()
---

## 

Reply to this email directly or view it on GitHub
https://github.com/jheare/OluridaSurvey2014/issues/12.

Hi Jake and Stephen,

No worries, thanks for the opportunity to have a look. The repository is amazing, I was thrilled at how easy it was to make the plots and so on. This kind of code inspection or reproducibility review is a really interesting idea and I'm glad to be able to try it out. I added a comment to the preprint to indicate that I'd been able to reproduce the figures and so on, that might be of interest to readers of the preprint.

One interesting approach that a few people are trying along the lines of a research compendium is 'paper as R package', where the scripts are turned into functions, the data are included in the package or as a separate data package, and the traditional package vignette (as an R markdown file with code and text) is the paper or supplementary material for a publication.

There are some very nice write-ups on this here http://rmflight.github.io/posts/2014/07/analyses_as_packages.html, http://rmflight.github.io/posts/2014/07/vignetteAnalysis.html and here http://www.carlboettiger.info/2012/05/06/research-workflow.html (both of those blogs are worth following for interesting stuff on scientific computing with R). The advantage of making an R package is that lots of quality control comes for free with building and testing an R package, lots of checks for missing brackets, etc., documentation of functions is simple, dependencies are pretty explicit, and it's easy to re-run (just download and install the package).

Let me know if I can help with anything or if any of my comments on your code are a bit too cryptic, etc.

jheare / OluridaSurvey2014

A few suggestions from a quick review #12