richfitz / wood

How much of the world is woody?
http://richfitz.github.io/wood
Other
20 stars 9 forks source link

How much of the world is woody?

Richard G. FitzJohn, Matthew W. Pennell, Amy E. Zanne, Peter F. Stevens, David C. Tank, William K. Cornwell

This repository contains all the code and data used in the manuscript.

Build Status

Synopsis

Running

make deps theplantlist-cache-unpack all

should run things with the greatest chance of success.

Automatically run version:

We use Travis CI to ensure reproducibility (or rather, repeatability/recomputability) of this project. A generated analysis can be found here. The log of running the analysis can be seen here (click on either of the jobs in the build matrix).

More detail:

There are two big prerequisites for running this analysis: (1) installing all the packages (with versions that work) and (2) downloading all the data that the analysis depends on. There are two ways of doing each of these! The data is described first, and the alternative package approach is described at the bottom of the file. This should only be needed once package versions have changed to the point where the analysis is no longer working.

Fetching the data

There are two ways of fetching the required data (see data/README.md for information on the data that we depend on).

Directly

This downloads data from Dryad and from The Plant List

make data-raw

Avoid hammering TPL

This fetches a set of data that I've archived.

make theplantlist-cache-unpack

This route allows you to delete all the data (make purge) and easily rerun the analysis (make theplantlist-cache-unpack all) without redownloading the data.

Running the analysis

To run the analysis, run the command

make

This will build processed versions of the data in the output directory. It then converts the file wood.R to a knitr script (wood.Rnw) and runs knitr on this to generate wood.md (in markdown) and the figures for the paper (in doc/figs).

The wood.md file is turned into a little html report of the analysis (wood.html).

The actual manuscript is in doc/wood-ms.pdf. Compiling this requires LaTeX to be installed.

Manually running everything

If you don't have make installed, then you can compile everything by running

source("make/manual.R")

(this needs to be run from within R, with the working directory set to the same as this file. If you use Rstudio, then opening the file wood.Rproj sets the working directory for you.

This will not compile the manuscript doc/wood-ms.tex to pdf; if you have LaTeX installed you will need to do that in whatever way you normally would on your system. However, all figures in the manuscript will be created in doc/figs.

Requirements

We require a few packages, namely dplyr, diversitree, RCurl and knitr, along with the non-CRAN package sowsear. Detailed version information is available in the file .packrat/packrat.lock (on github see here).

Running

make deps

will organise installing any missing packages and will warn about any packages that are out of date.

Manually:

Most packages can be installed off CRAN. To generate the report, we depend on the non-CRAN package sowsear. The easiest way to install that is with devtools

library(devtools)
install_github("richfitz/sowsear")

(install devtools with install.packages("devtools") if you don't already have it).

At present, we depend on the github version of diversitree; install that with

library(devtools)
install_github("richfitz/diversitree")

To recreate the geographic data (in data/geo/country_coords.csv) the rgdal package is also required, but this also requires system installation of gdal and should not

Using a known set of working packages with packrat

Version rot means that while the analysis works now, it may not work in a few years when packages have been updated and changed their APIs. To guard against this, we have archived a set of known working packages using packrat.

We didn't want to use packrat all the time (our package use is hopefully straightforward enough that a plain installation should work) and we didn't want to bog down the repository with about 20MB of package sources (especially as there are stable canonical sources for almost all packages because CRAN retains sources indefinitely). As such there is a fairly unfortunate, and likey fragile, bootstrapping procedure for enabling packrat that we have bodged together.

Run

make packrat-enable

which will download the known set of working sources from our releases page and copy files over from the .packrat directory. This puts packrat into the state that packrat assumes the project is always in. Packrat then goes through and compiles all the packages and installs them locally into a directory library. This process can take a while!

To disable packrat (putting the project back to using system-installed packages) run

make packrat-disable

To update the set of known working packages you can use the normal packrat tools and then run

make packrat-update

which copies local changes into the .packrat directory. These files can then be committed, though the remote tar.gz file would then need updating to share these changes.

To record a set of system-installed packages as working, run

make packrat-refresh

(note that this also sets the project up to use packrat, so running make packrat-disable afterwards is probably wise).

See make/packrat.mk for more information on our approach here, which does not fit neatly within packrat's scope. It's possible that by the time using the archived packages is necessary, better systems for doing this will exist.