Documenting analyses: RMarkdown vignettes

ebuhle commented 3 years ago

As mentioned here, it would be desirable to document our analyses in RMarkdown instead of copypasting into PowerPoint, etc. I volunteered to take the lead on one such vignette to establish the formatting and structure, i.e. the fiddly bits with a long learning curve.

Before getting to the nuts and bolts, it's worth thinking about what high-level structure would serve us best in the long run. I'd suggest multiple interdependent vignettes, paralleling the scripts, vs. a single massive one:

Coring-height corrections
Duncan ring-to-pith corrections
Missing outer rings (TBD?)
Informative prior on total tree age (maybe combine with the next one?)
State-space models with age uncertainty and covariates (the big kahuna, either a major overhaul of the existing proof-of-concept vignette or an entirely new one)

We should think about the intended use case when making these decisions -- e.g., a series of vignettes as sketched above would be perfect for the Software SI of a paper, but I'm not sure how you plan to structure the Big D.

Thoughts?

kzaret commented 3 years ago

What you suggest makes sense to me and jives with the intended structure of the Big D -- each chapter should be a publishable piece that stands on its own.

ebuhle commented 3 years ago

Great, although I'd imagine these would all be pieces of one chapter? But whatever, this makes sense for publication.

ebuhle commented 3 years ago

I made a minimal skeleton of an RMarkdown vignette for the Duncan pith-offset corrections and knitted it to HTML output. Have a look, tell me what you think, and feel free to ask RMarkdown questions which I will attempt to answer. You can re-knit the .Rmd yourself to see what that's like. Note that it will go faster if you've previously run the corresponding R script and saved the stanreg object so you don't have to re-fit the model every time you knit:

https://github.com/kzaret/RQ1v2_PIUVestab/blob/2494fa0e043550fb1780131a99149d1d6fce13b6/analysis/03_Duncan_rings-to-pith.R#L140-L142

Then you can use an RMarkdown chunk to load the saved workspace (echo = FALSE because we don't want to see this bit of code in the rendered output):

https://github.com/kzaret/RQ1v2_PIUVestab/blob/2494fa0e043550fb1780131a99149d1d6fce13b6/analysis/03_Duncan_rings-to-pith.Rmd#L45-L48

When we get to the chunk that fits the model, we only evaluate it if the model object doesn't already exist in the workspace (see below for why there doesn't appear to be any code in this code chunk):

https://github.com/kzaret/RQ1v2_PIUVestab/blob/2494fa0e043550fb1780131a99149d1d6fce13b6/analysis/03_Duncan_rings-to-pith.Rmd#L84-L85

This becomes even more critical when working with multiple models or ones that take longer to fit than duncan_lmer does; however, you need to be careful to update the .RData file(s) when anything relevant changes in the R script to ensure the correct version(s) are loaded into the RMarkdown environment during rendering.

I use a couple of other slightly "nonstandard" techniques that I've found to be essential to a happy RMarkdown workflow, but that you may not have encountered in basic tutorials. The most crucial one is code externalization, i.e. reading in code chunks from an external R script (in this case 03_Duncan_rings-to-pith.R) rather than typing them directly into the .Rmd. This is accomplished by calling knitr::read_chunk() like so:

https://github.com/kzaret/RQ1v2_PIUVestab/blob/2494fa0e043550fb1780131a99149d1d6fce13b6/analysis/03_Duncan_rings-to-pith.Rmd#L37-L39

It parses the external script into chunks based on special comment syntax, which is why you now see stuff like this throughout the script:

https://github.com/kzaret/RQ1v2_PIUVestab/blob/2494fa0e043550fb1780131a99149d1d6fce13b6/analysis/03_Duncan_rings-to-pith.R#L64-L70

Rather than try to explain code externalization here, I'll direct you to a repo where I dumped some toy examples I concocted a couple years ago when I was figuring this stuff out myself. Clone it into an RStudio project and then take a look at chunk_test.html, the chunk_test.Rmd that generated it (which may also serve as a primer on some basic RMarkdown syntax, esp. w.r.t. getting plots to look right), and the external scripts chunk_test1.R and chunk_test2.R that it uses. (You can ignore the render-ready script and its output; that's a nifty trick I learned from Jenny Bryan but haven't found as useful in my day-to-day workflow.)

Happy fishing! :fishing_pole_and_fish:

ebuhle commented 3 years ago

Just for giggles, I've added github_document as an output in the YAML of chunk_test.Rmd. "GitHub-flavored Markdown" is the very same minimalist markup language we're writing in right now, which GitHub renders as a sort of brain-damaged HTML. (It is possible to publish actual HTML on GitHub, i.e. build a simple website, but it's much more involved.) It's well-suited to an informal demo like this, where I don't really care that it ignores fig.align = "center" and puts the author and date on the same line, etc. (OK, I kinda do.)

kzaret commented 3 years ago

Thank you for all of the above!

kzaret / RQ2_Dendro_v2_PIUVestab

Documenting analyses: RMarkdown vignettes #5