Hourly methods - Githubissues

hshaban commented 6 years ago

Purpose of this task is to test and recommend methods that can handle hourly data and yield hourly results.

steevschmidt commented 6 years ago

Perhaps related, and just fyi: We did a "Duck Chart" of electric energy efficiency savings results for a program we ran in Mountain View back in 2012. Chart showing results: eumv duck chart 2012 This type of analysis is very easy to do after a program has been completed.

Clarifying 6/7/18: This data is NOT weather-normalized in any way: it is raw smart meter data. Identifying which hours should be weather normalized and by how much is not something we are able to do; to do this we'd have to know which hourly load was the AC unit (which should be normalized) vs the EV charger or pool pump (both of which should not).

steevschmidt commented 6 years ago

Both Bill Koran's excellent ECAM demo last week and the Phil Price video make me jealous of commercial energy data! In the residential space we rarely see homes with such repeatable/predictable patterns of energy use. Those with predictable energy use are often the most efficient homes, whereas homes like the two below (showing daily electric use) offer the best opportunities for savings: pp8347 pp8077

eliotcrowe commented 6 years ago

@steevschmidt - thanks for posting that duck chart. I've been wondering what is the best way to present those kind of results for hourly methods. I assume your chart is the average over a full year (?), which is informative but time-of-year impact is really important too. Having a duck chart for every day of the year sounds a bit excessive, but maybe a chart for each month of the year (or maybe seasonal?) starts to become useful. And one for impact one peak days (using max values from TMY?). The more you hone in on days/months your dataset gets smaller so your uncertainty increases. So, 2 questions:

Are there any pre-existing protocols for representing changes in loadshape that [a] capture seasonality and [b] account for uncertainty?
Is it appropriate for CalTRack to define such reporting protocols? I like the idea of having a recommended CalTrack approach on this.

steevschmidt commented 6 years ago

@eliotcrowe Correct, our duck chart lines each represent averages over a full year. I've seen seasonal versions that may capture the time-of-year variations, but it may make sense to tie it to new TOU rates and the dates they are valid...?

steevschmidt commented 6 years ago

More info about the variety of residential load profiles --

In 2014 Ram Rajagopal from Stanford analyzed residential load profiles (see "Lifestyle segmentation based on energy consumption data"). PG&E provided a year of smart meter data covering 210,000 California homes as part of the ARPA-e project. Ram and his team identified 272 unique load shapes that together characterized 90% of the 60 million days of data within +/-20%. Only 14% fit the "traditional dual peak" residential load shape. The paper shows a nice variety of these load shapes, together with their associated occurrence rates.

I've posted some related papers to this web folder.

mcgeeyoung commented 6 years ago

@eliotcrowe To get back to your earlier question...one of the earlier iterations of our work on hourly methods dealt with this issue in a somewhat creative way. I'd be curious to hear your thoughts on this. Essentially, what happens is that modeling uncertainty is better on monthly or daily savings calculations than on hourly savings calculations. What we did is we rolled up daily savings to monthly increments. We then calculated the hourly savings for each site and trued up the savings to the monthly values. The result of this was that the hourly methods essentially were being used to distribute savings over the course of the day, in monthly increments. weekdaysavingsresourcecurvebymonth

eliotcrowe commented 6 years ago

@mcgeeyoung Funny, I was thinking exactly that over the weekend! I like this approach a lot, especially if aggregating data for many sites. A few thoughts:

Curious what magnitude of true-up resulted when you tried this out? If it's significant then maybe we need to put more thought into the process/variables, but if it's minimal then maybe it's good to go
I like how your chart splits up the loadshapes by month (a lot more meaningful than an annualized average), but some utilities/stakeholders might want to split up differently based on their rate structures, peak period definitions etc. Maybe Caltrack can define a standardized 8760-hour reporting format (and perhaps the true up process) which allows any stakeholder to manipulate the data into whatever chart or reporting format meets their needs - I'm sure it would be very simple as an Excel template.
If converting the mass of data to hourly loadshapes I'd also recommend something like a box plot for each hour, so that you get a sense of predictive accuracy (lower for a single building, higher for a large portfolio). Doesn't have to be a boxplot but there should be some way to capture the variation

mcgeeyoung commented 6 years ago

I'd have to look back and see what we found in terms of differences. I think the thing we were worried about was any sort of systematic bias in the savings values relative to daily or monthly methods. I like your idea of setting up a way to roll up hourly savings into desired intervals (PG&E has a specific request for this). This is most likely a post-processing step in the aggregation phase and we'll want to figure out how to capture the uncertainties - much to the last point you made.

hshaban commented 6 years ago

Summary of the hourly methods recommendations from the 5/3 call:

Data management

Usage data sufficiency will be specified in terms of data coverage (aka common support) of the independent variables, instead of a minimum time period as in the daily and billing methods. @eliotcrowe will post references from LBNL.
Minimum data coverage for qualifying weather stations will be specified at 90%. If this criterion is not met, it will be recommended to use the next closest weather station within the climate zone.
For the pay-for-performance use case, it will be recommended to drop hours with missing temperature and usage data in the baseline period from the analysis. When calculating metered/payable savings in the reporting period, interpolation of weather data will be allowed for up to 6 hours (this number is a preliminary suggestion).

Modeling

The starting point for hourly methods in CalTRACK 2.0 will be the base Time Of Week and Temperature model from LBNL. R-code and references are available here: https://bitbucket.org/berkeleylab/eetd-loadshape, and here: https://github.com/LBNL-ETA/loadshape/tree/master/loadshape
The default occupancy detection algorithm is recommended for use as is.
Data is to be aggregated to hourly increments and the interval duration (intervalMinutes variable) is to be set as 60 minutes.
Constant temperature bins are to be used with endpoints at 30, 45, 55, 65, 75 and 90 degrees Fahrenheit.
Since we are interested in longer term forecasts, the model ensembling option that is included by default in the TOWT model is to be disabled. It is recommended to fit a single model to the whole dataset.

Use Cases and Uncertainty

For the program evaluation use case, it may be of more interest to obtain time-aggregated savings results (for example, at the monthly or annual level). Estimating time-aggregated uncertainty for hourly models is challenging due to residual autocorrelation and no practical methods have been rigorously tested yet using hourly data. if hourly data is available, but the user is not interested in the hour-by-hour savings results, we recommend using daily methods with improved ASHRAE or OLS formulations of Fractional Savings Uncertainty (see Koran (2017)).
For the procurement and aggregation-based pay-for-performance use cases, the hourly methods may be used to calculate hourly savings along with point estimates of uncertainty. Hourly savings may be aggregated across a portfolio, assuming independence of errors across buildings and provided any model bias is reported (at the annual level as well as for each time-of-week).

steevschmidt commented 6 years ago

Clarifying based on a comment in today's call: The "EUMV Duck Chart" I posted at the head of this thread was raw smart meter data, and NOT weather normalized in any way. As such, it's only accurate to the extent the weather in Mountain View during the 2011 and 2013 periods was similar.

hshaban commented 6 years ago

Summary of results from latest round of testing with Residential data (more details in the Week 20 meeting: http://www.caltrack.org/project-updates/week-twenty-caltrack-update)

TOWT model provided poor results at the site and portfolio level for residential buildings.

Inspired by #103, we tested "month-by-month" models which only used the same month from the previous year to fit a baseline model, instead of a full year baseline. This led to dramatic improvements in model fit across all buildings (CVRMSE distributions shown below).
Out-of-sample results indicated some overfitting when using pure month-by-month models (indicated by the difference between in-sample and out-of-sample results). This was reduced by expanding the baseline for a particular month to the month before and after, with a smaller weight assigned to those shoulder months. e.g. If a counterfactual is to be estimated for July 2018, we would use June, July and August 2017 as the baseline, with weights of 0.5 assigned to June and August.

bkoran commented 6 years ago

Slide 15 from the June 28 meeting included the following:

• Use 3-month weighted windows when using the hourly methods for Residential buildings. • Apply these weighted models when using the hourly methods for commercial buildings, pending further testing.

I am very strongly opposed to the use of monthly regressions, or weighted 3-month regressions, for commercial buildings. I am also skeptical of their use for residential buildings, although I know that the energy use relationships can be different in different months, as explained in the first paragraph of GitHub Issue #103, and HEA has more experience with residential modeling than I do. For commercial buildings the issues described mostly don’t exist, and some such changes are better treated as non-routine events than as routine adjustments. Some of these should be treated as routine adjustments in a multiple regression rather than creating individual monthly regressions, but that would be a future improvement.

Of course monthly or 3-month models give better fits. But are they a better counterfactual, beyond the potential issue with overfitting?

We are essentially defining 12 individual baseline models (although the rolling 3-month models ‘connects’ the models to some degree). A number of questions could be asked, and I anticipate a number of issues:

How do we know that these models are appropriate for a counterfactual? I.e., how do we know that an individual month represents typical use and operation for use as a counterfactual for the post-retrofit case for an individual home? IMO, we don’t; it’s just one month. We would need to look at the individual month for several years to see if it is typical for an individual home. Aggregated for many homes, then it probably is reasonable to assume that the aggregate represents typical use and operation. But there may be better approaches for aggregation.
The information from prior and succeeding months is lost since it is not included in the model for an individual month. Therefore, the relationship of kW to weather will be less well known. This is especially true with the TOWT, using fixed change points. There is little “signal” to define the relationship to weather. In some CA climates, where the monthly range of temperatures can approach the annual range of temperatures, this may or may not be a huge issue. However, in more extreme climates this would likely be a major issue. While this is CalTRACK, is it not OpenEE’s intent to use these models elsewhere? At any rate, it should be tested even for use in CA. What happens if the same month in the post-retrofit period has considerably different temperatures? I believe that aggregating many homes or businesses does not solve this issue, since it is the minimum range of temperatures that is the problem.
Following that same issue, for models based on a year of data, there are pending requirements for data (weather) coverage. Do these individual monthly or 3-month models meet those requirements, or avoid their need?
It is hard to determine whether energy use was changing over the baseline, and we know that the energy use relationships can be slightly different in different months, especially for residential. That said, I still believe one of the things we should be doing for NMEC projects and programs is looking at trends of energy use over time. Is energy use increasing or decreasing over the baseline period, and by how much? If energy use decreased significantly from the beginning to end of the baseline, should a program pay for that reduction in energy use for which it wasn’t responsible? Evaluation will answer that question, but I don’t think evaluation should be hamstrung by a method that makes it impossible to determine whether energy use was changing over the baseline because of the method alone, much less including the challenge inherent in the data. With separate models for each month, any month-to-month trend cannot be evaluated. It would be slightly better with rolling 3-month models, but only slightly better because of the weighting.
I believe the issue with using the TOWT model for residential is not that a monthly or 3-month model is required. The issue is that the TOWT model assumes only 2 temperature relationships during the day (or during the week, if daytypes are not defined). That assumption is not true for commercial, but the hourly coefficients get the results close. However, for residential, the temperature relationships vary almost continually during the day, and the hourly coefficients are not adequate to handle that. Recognizing this fact, and designing models appropriately, can result in accurate residential hourly models. This is especially true in aggregate, at the portfolio level.
If #5 is a significant issue for using the TOWT model for residential hourly models, going to weighted 3-month models may not solve it. (Yes, Slide 12 for the June 28 meeting showed a significant improvement, but I don't know how much testing was done, and whether the improvement holds for different temperatures.) More relationships to OAT may be needed, or a different definition of low use and high use hours, not less data in the model. See the figure below, which shows models by hour-of-day for a group of 10 homes. (The y-axis is average kW.)

hourlymodels

The next figure is better models for a group of ~200 homes in the same data set as the 10 above.

hourlymodels200 homes

Focusing on question #1 for a moment: Slide 8 from the June 28 meeting (Github Issue #103) shows how the monthly regressions indicate that the “Other” load is highest in May and June. I have not seen that in my residential modeling, but I have not done residential end use disaggregation. Was the end use disaggregation done by submetering? What was the physical meaning of the increase in “Other” during the spring—what is the reason that this variation is not only statistically significant, but appropriate for a counterfactual?

Seasonal home behaviors I have seen are likely a delay in turning on cooling in the spring, leaving it off in the fall even on some warm days (these effects could be due to school schedules) and increased use in winter, again possibly due to school schedules and also possibly for holiday lighting.

It seems a potentially more appropriate approach would be to use indicator variables or coefficients for certain groups of months rather than completely separate monthly models.

steevschmidt commented 6 years ago

Great comments Bill. Some updates:

Slide 8 from the June 28 meeting (Github Issue #103) shows how the monthly regressions indicate...

Sorry to be unclear -- this was just a visualization, not data from a real home. Here's a real one: f02640

With separate models for each month, any month-to-month trend cannot be evaluated.

I disagree; HEA disaggregates energy use by the month, and it is quite possible to see monthly trends in each of the eight categories we track.

...for residential, the temperature relationships vary almost continually during the day, and the hourly coefficients are not adequate to handle that.

I completely agree. Furthermore, homes demonstrate a wide range of time shifting: a temperature spike at 2pm may impact energy use at 10am that same day (due to pre-cooling) or not until noon the next day (due to high thermal mass).

Of course monthly or 3-month models give better fits. But are they a better counterfactual, beyond the potential issue with overfitting?

I have similar concerns about overfitting with such a complex model.

HEA uses this monthly approach of disaggregation to more accurately identify heating and cooling loads within each month in both baseline and reporting periods. We employ a much simpler approach to identify the counterfactual heating and cooling loads in the reporting period, using linear ratios of DDs between the two periods. Why model when we have enough data to disaggregate?

hshaban commented 6 years ago

Closing this issue as we're done with the first iteration of hourly methods. Opening a separate issue #105 to look into improvements related to exogenous trends in energy use for individual buildings

openeemeter / caltrack

Hourly methods #85