MattCowgill commented 4 years ago

Add screenshot(s) of how to structure a project folder, with the appropriate subfolders

On @gregmoran's suggestion

wfmackey commented 4 years ago

Should we also suggest that people use an R/ folder, and then a top-level Rmarkdown document that ties everything together -- ie runs all the code, in order, with a lil' explanation of what each is doing?

I'm doing that for the super-wages thing atm and I think I like it. It's just:

Build dataset

Read the WAD data (R/01_eba_read) and retrive the external economic data (R/02_economic_read), before combining, filtering and adding variables to the dataset (R/03_join_transform.R).

if (rebuild) {
  source("R/01_eba_read.R")
  source("R/02_economic_read.R")
  source("R/03_join_transform.R")
}

Run model

Run the regression models, including robustness checks and bootstrapping.

if (remodel) {
source("R/04_model.R")
}

Graphics

The key graphics in this document explore the data used in the models ("04_model.R"). They all use the run_vars dataset, which is a subset of the eba_agreements dataset, and therefore need to be run after the model is generated.

SG history and table

The proposed- and legislated-history of the SG in Australia is one of the main variables we use.

## Plot
source("R/06_plot_sg_history.R")

## Table
source("R/f_get_sg.R")

sg_table <- get_sg_history() %>%
  filter(row_number() != 1) %>%
  select(`Date` = date, 
    `Super Guarantee (less than $1m payroll)` = sg_small, 
    `Super Guarantee (more than $1m payroll)` = sg_large)

sg_table %>% 
  kable("latex", booktabs = TRUE) %>% 
  column_spec(2:3, width = "4cm") %>% 
  kable_styling(position = "center")

Plot variables

Little plotlets of main model variables, shown in the Data chapter, are produced using:

source("R/06_plot_variables.R")

Plot economic time-series variables

The economic time-series variables are also shown in the Data chapter.

source("R/06_plot_timeseries.R")

Plot agreement sample

source("R/06_plot_agreement_sample.R")

source("R/06_plot_boot.R")

MattCowgill commented 4 years ago

I like this workflow but I feel like it’s not always going to make sense. Often people will be creating a lot of small scripts that do a series of unrelated things (eg. make charts), sometimes using different datasets. I think your WAD work is more integrated than most Grattan projects.

So I think we should outline and suggest this type of set up for when you’re doing complex analyses with multiple linked files. The other folder structure stuff is more “this is how you should/must structure your folders”

I dunno

grattan / R_at_Grattan

add example of folder structure #37