CalTRACK Results [contested]. [NO EVIDENCE OF CLAIMS PRESENTED

steevschmidt commented 5 years ago

Background: HEA has been analyzing residential smart meter data since 2009, with the guiding principal that accurate analysis of a home’s energy profile is critical to providing appropriate recommendations and achieving high savings rates for the occupants. We started with the PRISM method but diverged from it very quickly due to poor results.

Since then, several types of feedback helped us improve our analysis:

We compared our results against sub-metered loads (e.g. HVAC, water heaters, pool pumps, etc) in a small number of homes (around 20).
We've presented over 10,000 individual home energy profiles to their occupants. While many were previously unaware of their detailed energy profile, over the years we've benefited from critical feedback from building science experts and dedicated DIY homeowners “on the ground” in their own homes. Here’s an example from a particularly genial customer in 2011:

For your 7.2 BTU/sqft/HDD , what is the HDD data source? I ask, since I'm sizing a new high-efficiency furnace. I'm arriving at somewhat higher numbers of BTU/sqft/HDD(F)/day for my calculations. For my HDD data source, I'm using the closest weather station from http://www.degreedays.net , which is KCAPORTO7. BTW, its becoming (rather oddly) entertaining to track down energy efficiency improvements using these tools! :)
We have compared our results to CBECC-Res on a variety of virtual homes modeled with different climate zones and configurations.
We have compared our calculations of key building metrics, such as BTU/sf/dd and internal gains, across large groups of homes to documented averages in the building stock. In each of these cases, any significant inconsistencies were investigated and fixed to the best of our abilities.

Issue: Now, over a year into a new residential P4P program, we have compared our results to CalTRACK on a portfolio of over 500 homes. CalTRACK's calculations of energy savings are consistently 50% below our calculations. We are being paid on CalTRACK results, so this apparent systemic underreporting is an existential threat to our P4P program.

Validation: If only it were easy. Our internal system is complex with over 400,000 lines of code, most of which has nothing to do with NMEC (the system is designed around the individual user’s experience). We have no “as built” specification for our analysis, the system is inconsistently documented, and has evolved over the past 9 years through a variety of contributors. Other than the results testing described above we have gone through no peer review process nor have we published papers about our methods.

Over the past 7 months we have documented in Github what we believe to be the most significant causes of our differing results. We have done this with the intent to help make CalTRACK as accurate as possible.

Requested CalTRACK change: Do not rely exclusively on out-of-sample testing. Consider adopting methods to compare CalTRACK results against ground truth data for individual buildings, in order to reduce the risk of systemic errors.

steevschmidt commented 5 years ago

We have received several requests to provide more information about HEA’s internal savings calculations, and how it differs from existing CalTRACK methods.

In prior CalTRACK issues we have highlighted a number of specific differences, but perhaps the biggest difference is our focus on monthly disaggregation of smart meter data over energy forecasting. These terms are used in different ways, so it’s probably best to describe our approach:

We disaggregate monthly energy use into five (5) load categories: heating, cooling, always on, recurring and variable. These last three have no equivalent in CalTRACK methods.
We do this for all months: those in the baseline period and all following months as soon as the data is available. We communicate changes to these loads to users monthly.
The bulk of the analysis utilizes only that one month’s smart meter data, but heating and cooling loads also incorporate data from prior months to improve accuracy. Our goal is always to identify each load as accurately as possible for a specific month.
During heating & cooling load disaggregation we identify the most appropriate balance point temperatures for use in regression analysis. In many homes this varies through the year as thermostat settings, routines & occupancy (internal gains) vary.
Once we have identified all loads for all months we can compare them individually. For example, recurring loads change as pool pump schedules are changed from season to season. Changes in heating & cooling loads are normalized with standard temperature degree days (e.g. monthly HDD65s) to allow comparison between any two periods and capture both changes in home assets (e.g. duct sealing) and behavioral changes (e.g. turning down the thermostat, which affects the balance point temperature).

Perhaps it’s a terminology issue, but we don’t consider that we ever produce a counterfactual model. Instead, we leverage rich smart meter data to compare disaggregated energy use from one period to another. The results can be seen in the sample report below showing changes in these five loads (light blue is cooling, red is heating) in terms of dollars and both fuels.

prog1 6699 prog2 6699

The report above is available to all HEA users, and we use this same data to compute total savings across our enrolled accounts.

mcgeeyoung commented 5 years ago

@steevschmidt Since the CalTRACK methods are open source, and an open-source reference implementation exists in the OpenEEmeter, you can actually conduct a site by site comparison of savings differences. If, for example, you are fixing your balance points at 65 and 75 degrees, respectively, and CalTRACK requires selecting a temperature balance point from the best fitting regression model across a wide variety of temperatures, you could compare the balance points and see how they are different. You could also look at the CalTRACK technical appendix where this issue was investigated and where it was explained why a variable balance point selection process was chosen. If you do vary your balance points, perhaps you can compare your method with the method used by CalTRACK to see if there are any significant differences. If that doesn't seem to be the issue, perhaps it's the slope of the coefficients. Perhaps you are estimating the effects of weather differently. Do you multiply your coefficients by the HDD/CDD beyond the balance points? Is it a daily HDD/CDD or a monthly roll up? There are quite a few more reasons you could be getting different results. Really only you will be in a position to determine why. Your approach looks interesting and I'm sure it works well for you and your customers. The point of doing CalTRACK is so that everyone gets measured by the same yardstick. Hopefully you can figure out how to calibrate your models a bit better so that you're not surprised by the results.

openeemeter / caltrack

CalTRACK Results [contested]. [NO EVIDENCE OF CLAIMS PRESENTED #109