infotroph / efrhizo

data & analysis code for UIUC/EBI Energy Farm minirhizotron project
1 stars 1 forks source link

Data and scripts for "Root volume distribution of maturing perennial grasses revealed by correcting for minirhizotron surface effects"

This is a Stan-based analysis of the root volumes of perennial bioenergy grasses, as observed by minirhizotron imaging at the EBI Energy Farm (Urbana IL) between 2009 and 2014.

A manuscript describing this project is in review. Email chris@ckblack.org if you'd like a copy of the current draft.

Raw images and the pixel-by-pixel tracing data are not stored here -- those live on the DeLucia fileserver. The primary "raw" data in this repository are the WinRhizo datafiles, which contain total length/area/volume and average width for each image, plus summaries of each root that are currently thrown out before analysis. I will include images and raw traces in the full data+script package, which will be made available on Dryad at the time the manuscript is accepted.

The whole analysis is intended to be fully reproducible. If anything changes, whether in raw data or final figure presentation, running $(make) in this directory should produce a fully updated version of the results.

Directory contents

data

Cleaned-up, finished, most authoritatize versions of datasets. Everything here is generated by some scripted process, NEVER by hand-editing.

NOTE: Some contents of the stan/ subdirectory are not committed in Git, because the output from a full model run is 845 MB large and needs to be recreated from scratch every time the model updates.

figures

Graphics generated from the cleaned-up data.

images

Static images for presentation/manuscript purposes: sample root images, screenshots, images of fieldwork, etc.

Makefile

Script for the Unix make utility, specifying how each component of the project depends on others and providing rules for how to automatically update each file when the files that it depends on have changed.

notes

Human-readable information. What I did, what I didn't do, reminders, to-do lists, etc.

operator-agreement

A sub-experiment asking "how similar are the data produced by different workers tracing the same images?" I'm now using these same images as a worker training battery.

This directory is not updated by the whole-project Make; there is a local Makefile instead. To rerun the operator agreement scripts, cd operator-agreement && make. See operator-agreement/ReadMe.md for more details.

protocols

Field maps, instructions for camera operators, tube installation schematics...

rawdata

Uncleaned data in the form it came to me: WinRhizo files, hand-compiled spreadsheets. If anything in here needs to change, it probably means we had to redo a lot of hours of work.

scripts

Tools to automate the rest of the analysis. Mostly written in R, some in bash.

stan

Scripts for hierarchical Bayesian inference on how root volume differs between crops and over time, written in the probabalistic programming language Stan. Also contains R and Bash scripts to handle the process of running the models on the IGB computing cluster or, with patience, on a sufficiently powerful laptop. TODO: Consolidate contents into scripts/?

tmp

Things I don't intend to keep but am not deleting just yet, e.g. logged debugging output. This directory is ignored by git, but needs to exist because some scripts write to it.

Installing & running

To run the analysis scripts you'll need:

To rerun my analyses: Open a shell, cd to the root of the project directory, type make, and walk away for at least an hour, or much longer if your computer has fewer than 5 CPU cores. The whole run takes ~80 minutes, mostly CPU-bound, on my 8-core mid-2015 Macbook Pro (2.2 GHz i7).

To run individual components: See comments in scripts, usage in the Makefile, and, uh, probably ask me questions about the parts I forgot to document.

The general shape of the data cleanup pipeline is as follows:

To fit Stan models to the clean rhizotron data:

I have run mctd_foursurf successfully on OS X 10.11.6 and Amazon Linux AMI release 2016.03, but have not tested it in Windows. The other models appear to run well on my machine, but I haven't tested them cross-platform and I have not validated their output carefully. Consider them work in progress!

Questions? Chris Black black11@illinois.edu or chris@ckblack.org or https://twitter.com/infotroph 503-929-9421