Closed jbackusm closed 7 years ago
Great idea.
Last year we talked about having out-of-sample pre-retrofit CVRMSE as a general method for model selection, and I think it's a reasonable option for aiding evaluation and decision making on a variety of model choice issues, including balance point temp optimization.
I think that this is an area where we can show significant improvement over monthly methods, even if it is marginally significant. With more frequent and higher frequency data points, creating a hold out sample to cross validate model selection becomes an option. I'll talk to Ken on Monday.
Totally agree.
In the daily methods draft document, we currently have the following VDD optimization procedure defined:
Variable Degree Day Base Temperature Search and OptimizationSpace
CalTRACK daily methods will use variable degree day base temperatures. Balance point temperatures will be selected by doing a search over the two parameter HDD and CDD model separately using the following grid search criteria:
1) Search range for HDD base temp:
55 degrees F to 65 degrees F
2) Search range for CDD base temp:65 degrees F to 75 degrees F
3) With the constraintHDD Base Temp
<=CDD Base Temp
Grid search step size:
15 degrees
Model Qualification
For each site, the choice must be made between using one of the single parameter models (just
HDD
orCDD
) or combinedHDD
andCDD
models. This choice is called model selection. For CalTRACK, model selection will be done by sequential model fit in the following way:
- Fit combined
HDD
+CDD
model with the constraintbeta_HDD,beta_CDD >0
- If the parameter estimates for the combined model each meet minimum significance criteria (
p < 0.1
) and are strictly positive then the combined model is used.- If only one of the degree day coefficients has
p < 0.1
, retain the significant term (heating or cooling) and refit the single-parameter model.- If neither the heating nor the cooling coefficient has a p-value of less than 10% in the respective model, drop both terms and use mean daily consumption for the month (or year) as the relevant stage-one statistic.
Model Selection/Optimization
Among qualifying models, the model with the maximum adjusted R-squared will be selected as the best fit model for second-stage savings estimation.
How is that as a starting point? One of my biggest concerns to this is how we keep it stable as we consider additional factors for inclusion in the base daily model. I think for now, we can constrain the multi-model fit to be only varying the inclusion of (HDD, CDD) and keep all schedule-related factors that get added to the base model in all models for the purpose of model selection. Does that sound good?
One thing we've found to be important in addition to these criteria: when scanning across balance points, we should require a minimum number of days with nonzero degree-days, substantially higher than 1. This avoids overfitting in the case where only a few days exist with usage and nonzero degree-days, and the usage happens by chance to be unusually high on those days. I believe this situation is not currently avoided by the p-value and adjusted R-squared criteria alone, though that may depend on exactly how you define the p-value and adjusted R-squared.
Also, what do we do in the case that we follow these steps and we see a strong tendency toward extreme values in the optimized balance points? We spent some time discussing this last Fall, but I don't think we ever arrived at a consensus as to how we should handle this flavor of overfitting.
@mcgeeyoung asked me to offer a proposal regarding data sufficiency for estimating a trend, related to @jbackusm comment above. We had discussed some % cut-off but that does not address @jbackusm's comment above about needing more than just non-zero data, but also data beyond just, say, one.
I'm inclined to recommend something along the lines of either ten non-zero days or a sum of 20 DD whichever is smaller (least restrictive). If the variation in DD is small, say between 1 and 3 then we would want 10 days to be comfortable with the estimated trend. If there is just one extreme heat wave, perhaps three days with an average of 7 DD would be sufficient. I'm not wed to the specific numbers, but the combination offers a way to address the two concerns of sufficient data and sufficient variation.
What do you guys think? About the structure? the specific numbers?
@gkagnew Your suggestion sounds like a good approach to us. We don't have strong feelings on specific numbers, but 10 non-zero days or a sum of 20 DD seem reasonable.
It sounds like we've got agreement on this issue, so I updated the analysis spec to include this recommendation under a section called "Grid Search Data Sufficiency".
I believe this is currently unspecified and not explicitly tracked in the 2017 beta-test timeline, but I think it will require a substantial amount of work--both discussion and analysis.
I think @jfarland and Ken Agnew have significant experience and interest in this question, and this is also a topic of current interest for us here at EnergySavvy. As a starting point, can we define some metrics that can be used to validate our approach to balance-point optimization, i.e. how will we know that the optimization is working?