Create daily methods testing output format for model comparison across testers

impactlab / caltrack

Shared repository for documentation and testing of CalTRACK methods

http://docs.caltrack.org

6 stars 5 forks source link

Create daily methods testing output format for model comparison across testers #60

Closed matthewgee closed 7 years ago

matthewgee commented 7 years ago

To make sure testers are creating the same structured output from the testing dataset so we can do model comparisons, we need a well-defined output format that each tester is generating in parallel.

matthewgee commented 7 years ago

Here's the simplest possible draft output file format I could think to start from. What are folks thoughts? What's missing?

https://docs.google.com/spreadsheets/d/1RckQsBfYyi25xMzQcYW_Tgcgr0p4d49EFL0agHir2Iw/edit?usp=sharing

houghb commented 7 years ago

@matthewgee I added a section to your output file spreadsheet for definitions. Can you fill those in so we can all make sure we're on the same page with all the terms from the start and are defining things in the same way?

matthewgee commented 7 years ago

@tplagge and I added a new tab with field definitions. @houghb let us know if you think anything's missing

houghb commented 7 years ago

Thanks @matthewgee, this definitely helps clear things up.

I don't see anything missing, but suggest restricting the training period to one year of data, instead of including all data prior to the testing year. If it includes all available data, and we continue using OLS, then we'll be biasing our models toward whatever months occur more often in the training data (for example if there is 15 months of available training data that starts in January, then the model will be trained with two winter periods and one summer period, so will be biased toward winter predictions).

Since the CalTRACK use case will only make use of one year of pretreatment data to train models, and we're using this output format to compare different modeling approaches for that use case, we favor restricting the training data to one year.

matthewgee commented 7 years ago

@houghb Good point about 12 month vs >12 month periods on the training data. My operating assumption is that the start date of energy traces is pretty evenly distributed across the year within climate zones, so there wouldn't be aggregate bias within CZ buckets in the testing file, but we should double check that that assumption is true. If it is true, then more data (even partial year) can provide better signal on either one or both the structural weather dependencies at the site level under the assumption of static structural conditions over the entire time period. In other words, under that assumption, the HDD/CDD estimates shouldn't be biased by >12 month partial year data, just one parameter estimate may be a lower variance estimate of the true structural parameter than the other. If we can reduce the variance on at least one parameter by allowing for >12 month partial year energy traces in the training set, shouldn't we do that?

tplagge commented 7 years ago

At least in the 1000-home electric sample, there does seem to be a bit of non-uniformity in the start dates. Not sure why this would be the case, but it does seem to suggest a single year of training data could protect us from at least one potential source of bias.

houghb commented 7 years ago

To the "shouldn't we do that" question - my understanding of this process is that we're working on choosing the best model formulation for the CalTRACK use case, and that use case specifies using just one year of data to train the models (am I misinterpreting this?). In that case, we should be selecting models that give us the best performance with just one year of data, even if a different model could perform better if it has >12 months of data. I don't have any evidence that the best model for 12+ months is different for the best model with 12 months, but if we allow more than 12 months then we'll need to do additional work later to check on that, so I am trying to prevent scope creep. There are a number of additional reasons that allowing more training data could be problematic, but this is the strongest reason to limit it to one year from my perspective...

houghb commented 7 years ago

@matthewgee I think on the phone call last week we agreed to limit to one year of training data, can we update the google spreadsheet and call this issue closed?

matthewgee commented 7 years ago

Great discussion here and a good decision. I think we can close the issue.