Plan for balance point optimization

houghb commented 7 years ago

@jfarland @matthewgee @tplagge @jbackusm @gkagnew

On the phone call today we agreed to work on balance point optimization in parallel over the next sprint. Please chime in on this issue soon (let's shoot for responses here before next Wed, 4/12) to share what your plan is for exploring different aspects of the balance point optimization so we don't duplicate work.

Also see Issue https://github.com/impactlab/caltrack/issues/57 where folks already did a little thinking about this.

houghb commented 7 years ago

We've done a little bit of exploring with the 1000 home dataset to see whether using variable vs fixed balance points makes a difference in aggregate. We found that there is not a significant difference in median savings values regardless of the choice of [reasonable] fixed balance points compared to exploring the variable balance point range in the current spec.

What I am planning to do next is break the dataset into climate zones and explore the impact of balance point selection in each climate zone individually. We expect the inland premises may be more sensitive to balance point choices than the bay area. If that is the case we'll start to look into the impact of different methods of variable balance point selection.

matthewgee commented 7 years ago

During this sprint, the OEE team is going to explore using gradient descent vs the current grid search method for arriving at optimal balance point temps. Since one of the main advantages to gradient descent will be increasing the computational efficiency of the spec, we'll be adding runtime to our standard outputs.

matthewgee commented 7 years ago

Based on the discussion today, we decided to focus in on three (and a half) main areas that we need to make decisions on for balance point temp optimization:

Temperature ranges and step size
Loss function choice
Dealing with outliers and boundary conditions in optimization

houghb commented 7 years ago

Thanks, Matt. We will work on the outliers and boundary conditions next week.

matthewgee commented 7 years ago

@houghb awesome. OEE will work on loss function choice.

tplagge commented 7 years ago

Loss function choice

In schematic form, our HDD/CDD balance point determination algorithm is:

Choose candidate balance points.
Run a linear regression using the HDD/CDD values for these balance points, requiring that the intercept, heating, and cooling coefficient all be nonnegative and the heating and cooling coefficients be statistically significant.
Calculate the loss function.
Choose new balance points and repeat.
Select the balance points with the lowest loss function.

Presently, we minimize the sum of squares (i.e., a quadratic or least squares loss function). However, this loss function is quite sensitive to outliers; more robust alternatives are available. I’ll consider four candidate loss functions, where y_i is the observed usage on the i’th day and ŷ_i is the modeled usage on the i’th day:

Quadratic, i.e. least squares: Sum((y_i - ŷ_i)2)
Linear, i.e. absolute value: Sum(|y_i - ŷ_i|)
Huber: Quadratic for small residuals, linear for large
Tukey’s bi-square loss function, which cuts off at large values.

Large residuals matter most for a quadratic loss function, and least for the Tukey bisquare loss function, with absolute value and Huber somewhere in between. We expect the quadratic loss function to give the best constraints in the case where outliers are relatively uncommon and/or small in magnitude, and the others to be more robust to outliers but to yield looser parameter constraints when outliers are uncommon and/or small.

I went through the 1000-home electricity data set and selected the 263 projects which were best fit by a CDD + HDD model (the remainder either had no significant heating or cooling component, or fell back to an intercept-only model). The 1-year baseline periods (prior to work start) for this subset of the 1000-home sample is what I'll use to assess these alternatives.

Sanity check

As a first cut, we can just take a look at the skewness and kurtosis of the residuals. If we see evidence for non-normality, then it makes sense to at least consider using the robust loss functions. And indeed, over 80% of the homes show evidence (p<0.05) for non-normality in skewness and/or kurtosis. This is perhaps not surprising given that we’re not including relevant fixed effects.

So let’s take a look at whether this non-normality has a strong influence on the best-fit balance points. If we repeatedly remove a random 10% of the days from the dataset and determine new balance points, the results shouldn’t change much. If they do, then it’s plausible that outliers are driving the fit.

I ran the fitting routine for each of the 263 baseline periods 25 times, each time throwing out a random 10% of the days in the baseline period, which still leaves over 300 days of usage and temperature data in each sample. Then I recorded the mean and the standard deviation of balance point temperatures for each home across these 25 runs. Here’s the histogram of the HDD balance point standard deviation:

For about 20% of the homes, we see deviations in HDD balance point greater than a degree Fahrenheit when we randomly censor 10% of the data. This is good enough motivation to say that this exploration is worthwhile.

Comparison

I repeated this procedure for the other three candidate loss functions, and compared the results. If it's outliers driving the instability in these balance point estimates, then more robust loss functions should show more stable estimates.

However, I found that the usual quadratic loss function performed the best on the above metric: it yielded the lowest average standard deviation of balance point temperatures across the 263 sites, i.e. the most stable fits. The mean/median standard deviations were lowest for quadratic loss functions, highest for Tukey bisquare loss functions (by about a factor of two), and in the middle for the linear and Huber loss functions. (The medians were 0.39, 0.49, 0.47, and 0.79 for quadratic, linear, Huber, and Tukey, respectively.)

Below are histograms of the HDD balance point temperature mean and standard deviation across the 25 subsamples of each of the 263 homes using the four different loss functions. The Tukey biweight loss function produces the flattest distribution of balance point temperatures, as well as the loosest constraints; the quadratic loss function produces the tightest constraints (as well as, interestingly, the most estimates which are pegged to the upper edge of the 55-65 degree range).

Here’s the quadratic versus absolute value loss function estimates plotted against one another:

For the most part, the results are fairly consistent within the error bars--there doesn't seem to be a large bias. Since the scatter is smaller in the aggregate for the quadratic loss function, one might simply stop here and say it is the best choice.

Counterargument

An interesting thing to note here is that there are a few homes in the lower right hand corner where the results are quite different for the two loss functions shown above. Here’s one of them:

Indeed, there look to be some outliers, particularly in May 2013. Here’s temperature plotted versus usage:

The quadratic loss function gives a balance point of 65, which does indeed look to be driven by the outlier values (low usage at 65-75 degrees). The absolute value loss function, by contrast, yields 55 degrees--mask out those outliers, and that’s probably what you’d estimate by eye.

Conclusions

For about 80% of homes, the choice of loss function influences the choice of balance point temperature by only about a degree or less.
Robust (non-quadratic) loss functions are less sensitive to a few days of energy usage that don’t follow the typical pattern, which sometimes yields better results by eye in homes with abnormalities.
However, the quadratic loss function nevertheless produces the most stable results across the test data set.
The Huber and absolute value loss functions perform similarly on this metric, and better than the Tukey loss function. The Huber loss function has the advantage of being differentiable, so it is likely the best choice if a non-quadratic loss function is desired.
My personal opinion: it’s a toss-up between the quadratic loss function, which gives the most stable results overall; and Huber loss function, which also gives fairly stable results overall and performs somewhat better in homes with abnormal usage periods. Open to suggestions and comments from the group.

houghb commented 7 years ago

Choice of boundary conditions (and model selection criteria)

We set out to look at whether our current practice of allowing models to be selected even if their heating or cooling balance point is an extreme value (either the max or min of the explored range) should be changed. We planned to investigate the outcomes of three scenarios described in the UMP:

Allow all variable balance point values (this is what is in the current spec, so heating balance point values of 55 or 65 are acceptable). Often this results in a distribution of balance points with large piles on one or both ends of the range.
If a model is selected and it has a balance point that is an extreme value (55 or 65 for heating, 65 or 75 for cooling), then default to the model with the median balance point (60 for heating, 70 for cooling). If that median model is not a valid model (because it has negative coefficients or too large a p-value), then default to the intercept-only model.
Same as above, but instead of defaulting to the median value choose the mean balance point value of all the models in this portfolio that did not select an extreme balance point as the best value. In practice we found that the mean balance point values were always the same as the median values, so this approach is the same as No. 2 for this dataset.

While doing this we also wanted to see if the choice of model selection criteria impacted this analysis, so we explored the two options (1 and 2 in the list above) for three different model selection criteria:

adjusted R-squared (this is what is currently in the spec)
RMSE
MAE

Here are our findings (in the new output format): elec_report.xlsx gas_report.xlsx

Here are some plots showing the CVRMSE and NMBE for the different options and climate zones. Electric: NOTE: climate zone 3 is not shown on this plot, all of it's scores are poor and far away from the rest of the climate zones -- see the report linked above. electric_results

Gas: gas_results

When there is a difference in CVRMSE or NMBE scores the following conclusions hold:

Models that allow the extreme balance point values perform better than those that change the balance points to their median values (for this balance point range; this might change if the balance point range is larger or has a larger stepsize).
Adjusted R-squared and RMSE selection criteria always result in the same set of balance points being selected, so they both produce the same scores for CVRMSE, NMBE
Adjusted R-squared/RMSE perform better than MAE as methods for choosing the best set of balance points.

houghb commented 7 years ago

After all this good exploration, and discussions on the phone and google docs over several weeks, it sounds like the conclusion is to stick with the current method in the spec of using adjusted R-squared as the loss function, maintaining the existing balance point range and step size (1 degree steps across a 10 degree range), and not to adjust balance points that pile up on the ends of the explored range.

impactlab / caltrack

Plan for balance point optimization #61