lindsayplatt / salt-modeling-data

Reproducible code for downloading, processing, and modeling data related to river salinization dynamics
0 stars 0 forks source link

salt-modeling-data

This repository contains reproducible code for downloading, processing, and modeling data related to river salinization dynamics. Using The Turing Ways's definitions, this code and analysis are intended to be fully reproducible and could be somewhat replicable with different states and/or dates.

Associated publications and resources

The code supports the analysis for Lindsay Platt's (@lindsayplatt) Master's Thesis:

Platt, L. (2024). Basins modulate signatures of river salinization (Master's thesis). University of Wisconsin-Madison, Freshwater and Marine Sciences. Platt, L. (2024). Source code: Basins modulate signatures of river salinization (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.11130548

Running the code

This repository is setup as an automated pipeline using the targets R package in order to orchestrate a complex, modular workflow where dependency tracking determines which components need to be built. As written, this pipeline will need about 2.5 hours to build and will need to have an internet connection.

The pipeline is broken into 6 different phases:

Pipeline setup

Run the following command to make sure you have all the necessary packages before trying to build the pipeline.

install.packages(c(
    'targets', 
    'tarchetypes',
    'accelerometry',
    'arrow',
    'cowplot',
    'dataRetrieval',
    'EnvStats',
    'exactextractr',
    'FlowScreen',
    'GGally', 
    'httr',
    'MESS',
    'nhdplusTools',
    'pdp',
    'qs',
    'randomForest',
    'raster',
    'sbtools',
    'scico',
    'sf',
    'tidytext',
    'tidyverse',
    'units',
    'usmap',
    'yaml',
    'zip'
))

The following package versions were used during the original pipeline build. You shouldn't need to install these versions specifically, but if there are errors cropping up, you could try installing these specific versions and see if you can get past the issue.

Package Version
targets 1.5.1
tarchetypes 0.7.12
accelerometry 3.1.2
arrow 14.0.2.1
cowplot 1.1.3
dataRetrieval 2.7.15
EnvStats 2.8.1
exactextractr 0.10.0
FlowScreen 1.2.6
GGally 2.2.1
httr 1.4.7
MESS 0.5.12
nhdplusTools 1.0.0
pdp 0.8.1
qs 0.25.7
randomForest 4.7.1.1
raster 3.6.26
sbtools 1.3.1
scico 1.5.0
sf 1.0.15
tidytext 0.4.1
tidyverse 2.0.0
units 0.8.5
usmap 0.7.0
yaml 2.3.8
zip 2.3.1

Pipeline build

To build this pipeline (after running the setup section), you should

  1. Open the run_pipeline.R script.
  2. Click on the Background Jobs tab in RStudio (next to Console and Terminal).
  3. Choose Start Background Job and make sure the run_pipeline.R script is selected.
  4. Accept the defaults and click Start to kick off the pipeline build.

This will build the pipeline in the background, so that your RStudio session can still be used as the job is running.

Pipeline outputs

Many of the pipeline's artifacts are "object targets" but there are some files created. As of 5/7/2024, the best way to see how the pipeline and analysis ran is to open the figures and data stored in 7_Disseminate/out/. This will only have built if all the other pipeline steps were successfully run. It contains all of the figures that appeared in the manuscript.