Closed steevschmidt closed 5 years ago
For CalTRACK, we decided to use out-of-sample testing to gauge the uncertainty associated with estimating counterfactual usage. Accuracy is probably not the best way to describe the nature of the uncertainty that we're dealing with. But it turned out that out-of-sample testing proved a reliable way to evaluate methods choices. For CalTRACK 2.0 we would like to keep the same testing regime in place. We can look at specific steps in the methodology and evaluate whether or not a revised approach would yield a better out of sample result.
In addition to validating CalTRACK regression results as suggested above, another bulk approach may be much easier: compare the calculated heating intensity (btu/sf/hdd) of homes to expected norms.
Possible approach:
Note this would only be possible for homes where we have data on all primary heating fuels (e.g. electricity and natural gas).
Any reason this wouldn't work? If it does, it may provide a useful metric for #71.
Related to NMEC accuracy, adding a reference to an excellent paper by Sam Borgeson for PG&E on targeting EE programs for SMBs. Snippet from page 54:
Potential sources of NMEC savings bias:
- In large samples, mean-zero fluctuations and site-specific changes in consumption are often assumed to cancel out across premises (for every site with an increase, there is a corresponding site with a decrease). However, shared factors like droughts, prevailing economic conditions, etc. can cause shifts in consumption that do not cancel out. Further, these exogenous factors can impact certain customer segments more than others.
- Similarly, a weather normalization model that is overly temperature sensitive or was trained using relatively cool (or hot) weather data, could create systematic biases when trying to normalize consumption for a relatively hot (or cold) year.
- Trends in energy consumption (i.e. organic LED adoption or plug load growth) can also undermine the assumption that models trained on pre-period data can provide unbiased estimates of the counterfactual conditions for the post-period.
Closing this issue as out-of-sample testing was used for CalTRACK 2
All of these issues apply to future CalTRACK improvements; I'd like to request this ticket not be closed, but instead be moved into the "future requests" category.
Ok, will bring this to discussion with the next working group
Recently McGee posted to the Recurve blog an internal discussion titled Accuracy: Why I Hate That Term which helped me understand his prior comments (and our differing views) on this topic. I realize now we may have been talking about two different types of accuracy.
A slide from the presentation is shown here:
From HEA's perspective, the answer to the third bullet is a resounding Yes: NMEC accuracy for residential homes should include identification of ALL non-weather-related changes in energy consumption, no matter what the cause. We are normalizing [residential] building energy use for weather, and nothing else. So it's critical that we identify HVAC loads accurately.
On the other hand, the first two bullets -- and much of the related discussion in the video -- relate to accuracy of attribution (i.e. "explaining") not accuracy of NMEC. We agree the former is unknowable and agree with McGee on his analysis of that issue. However, accuracy in NMEC, the intended focus of this Issue, is a different beast altogether and can be known and measured.
For example, the "True Value" (i.e. Ground Truth) measurement of how much of a building's energy went toward heating in a given period can be measured (not modeled): every year, Gil Masters at Stanford has his building science students do this in a small mobile home with a single resistance heater, and their grade depends on the accuracy of their analysis.
Likewise when we use CalTRACK to identify heating and cooling loads in a baseline period (in order to normalize them for weather) we could measure the accuracy of the resulting model against ground truth during the baseline period: did the model produce the same value of heating load as was measured? One simple example of such a test would be to run CalTRACK on Gil's mobile home and confirm the resulting heating coefficients during the baseline period result in a heating load similar to what the students measured. But there are other ways as well; I proposed some in #122.
McGee wrote above "we decided to use out-of-sample testing to gauge the uncertainty associated with estimating counterfactual usage". As described in #123, this works only for buildings with predictable energy use: if energy use patterns during the period used to build the model differs from the energy use patterns in the out-of-sample period, all bets are off. We need to develop other methods to assess & improve the accuracy of the CalTRACK model vs the ground truth during that same baseline period.
We concur with Steve about the importance of model accuracy to NMEC. See a related discussion here: https://gridium.com/evo-measurement-verification-accuracy/
HEA believes accuracy of CalTRACK results is critical for the long term success of P4P programs, should be a top priority along with the other three, and should inform the priority of other tasks.
Background --
We have been analyzing residential smart meter data since 2008, and deployed our first customer-facing disaggregation tool in 2010. We learned quickly that some customers are more energy savvy than others (e.g. Art Rosenfeld, Gil Masters, CEC & CPUC staffers, etc) and that getting the analysis right for each home was crucial to providing the right recommendations. So we too have been humbled by the challenges in this space. And since very little "ground truth" data exists, we had to come up with other methods to test the accuracy of our system.
We have pursued three primary approaches:
HEA has been a long time champion for P4P because we believe it will drive EE to be more cost effective. With one P4P contract in place and a 2nd in final negotiations, HEA is very motivated to make CalTRACK successful.
Can we discuss/develop a strategy for accuracy?