sdtaylor / phenology_dataset_study

1 stars 1 forks source link

do a true out of sample comparison #22

Closed sdtaylor closed 6 years ago

sdtaylor commented 6 years ago

Use some hold out data in the NPN and LTS datasets to actual out of sample comparisons at the end

sdtaylor commented 6 years ago

years used for testing data within each dataset

dataset_test_years = list(c(2012,2013,2014), #harvard
                          c(2015,2016),      #npn
                          c(2013,2014,2015), #hubbard
                          c(2014,2015),      #hjandrews
                          c(NA))             #jornada, no formal test data
sdtaylor commented 6 years ago

leaving out specific years for each dataset leads to some species/phenophases with many test samples, and some with very few. Instead I'll do, for each species/phenophases, a random 20% of observations

sdtaylor commented 6 years ago

done https://github.com/sdtaylor/phenology_dataset_study/commit/9e5e4363b04214809118e7ecb95aa1519338ca5e