sdtaylor / PhenograssReplication

0 stars 0 forks source link

model optimization/validation route #2

Closed sdtaylor closed 4 years ago

sdtaylor commented 4 years ago

From the original paper

  1. Fit model to all timeseries/all years
  2. Do leave-one-out CV, leaving scaling factor h constant to prior fitted value. 2a. look at mean/var of each parameter as well as avg R2

image

image

sdtaylor commented 4 years ago

My plan:

Have different model building sets: all grassland all great plains ecoregion all grassland within great plains ecoregion etc. all sites period

1 fit model within each grouping

  1. within each grouping do leave-one-out cv
  2. ....somehow figure out which grouping is optimal
sdtaylor commented 4 years ago

Overarching question

Would like to make forecasts over as much area and vegetation types as possible. How should the model be built?

Using all sites lumped together? Split out by veg type? Split out by ecoregion? Split out by ecoregion/veg type?

Comparison Avg site level error (r2 & rmse) of each

sdtaylor commented 4 years ago

Need an error comparison like so

Full model Ecoregion Model Vegtype Model Ecoregion/vegtype model
Great Plains 0.3 0.4 NA NA
- Grassland 0.2 0.8 0.2 0.6
- Ag
Eastern Forests
- Grassland
- Ag

Where, for each unique spatial scale, each model is tested on it. And where errors represent avg r2 for each timeseries (not aggregated R2).

sdtaylor commented 4 years ago

The above can represent errors without doing any cross validation, as that can be a "best case scenario" benchmark. Then the "winning" models can have subsequent cross-validation to verify.

sdtaylor commented 4 years ago

A site level error table for the supplement

Full model Ecoregion Model Vegtype Model Ecoregion/vegtype model
Great Plains
- Grassland
site 1 - GR 0.2 0.4 0.5 0.8
site 4 - GR 0.8 0.5 0.2 0.8
- Ag
site 7 - AG 0.5 0.52 0.22 0.23
Eastern Forests
- Grassland
site 44 - GR 0.53 0.78 0.87 0.95
sdtaylor commented 4 years ago

all this is implemented