openeemeter / caltrack

Shared repository for documentation and testing of CalTRACK methods
http://docs.caltrack.org
Creative Commons Zero v1.0 Universal
54 stars 14 forks source link

[CLOSED] Include calendar effects (day-of-week, month-of-year, holidays) in daily model? #57

Closed hshaban closed 6 years ago

hshaban commented 6 years ago

Issue by tplagge Tuesday Mar 14, 2017 at 22:00 GMT Originally opened as https://github.com/impactlab/caltrack/issues/56


It has been proposed that the Caltrack daily model process begin by considering HDD/CDD only as exogenous variables, much as with the monthly model. However, there are good a priori reasons to expect the daily data to exhibit calendar effects, i.e. for the usage to be dependent on things like the day of the week. If we see sufficient evidence for calendar effects, we may want to include categorical variables for the relevant effects in our model, since not doing so would result in non-stationary residuals.

The CEC has published reports suggesting the possibility of including dummy variables for factors like day of week and month of year (see Appendix A here). To see whether this might be worth considering for our residential usage data, I took the 100-home sample used in the monthly Caltrack beta test and did some basic aggregation.

I started by loading the temperature and electrical usage data for the 100 traces in the monthly Caltrack beta test. Note that while the model we were evaluating in the beta test was monthly, the data is actually daily AMI metered usage. Next, I normalized each trace, dividing all of the daily usage values by the trace mean. I then computed HDD and CDD using fixed balance points.

I then fit each trace using the Caltrack procedure, but without monthly averaging; i.e., I fit four models: Usage = Intercept + e Usage = Intercept + βc CDD + e Usage = Intercept + βh HDD + e Usage = Intercept + βc CDD + βh HDD + e Each model qualifies if all parameters are positive and significant (p < 0.1), and the qualified model with the best R2 is selected.

So, I now have a time series of normalized residuals for each of the 100 traces. If our CDD/HDD model is sufficient, we expect the residuals to be stationary, and specifically for there to be no calendar effects like systematic overestimates or underestimates based on factors like temperature, month of year, or day of week.

Results

screen shot 2017-03-14 at 12 10 23 pm screen shot 2017-03-14 at 12 10 30 pm

Conclusions We can make more accurate predictions and get less-correlated errors by including factors for day-of-week, month-of-year, and holidays in the daily model. The month-of-year, holiday, and day-of-week corrections all look to be similar in magnitude. The R2 values will be higher, and the out-of-sample predictions should be better (as long as we regularize to account for possible overfitting, if necessary).

However, the effects of including the categorical variables for day-of-week, month-of-year, etc. are not very dramatic when considering aggregate quantities computed over an entire year. The fact that we slightly underestimate weekend usage and overestimate weekday usage shouldn’t matter so long as we sum over a full year of weekdays and weekends. It’s only if we want to make predictions about Sunday versus Monday usage (which from previous discussions does not appear to be a use case we anticipate here) that these effects become important.

This should be intuitively obvious, but as a demonstration of this, I fit one years’ worth of data from each of the 100 accounts using the model specified above; the model specified above plus day-of-week; and the model specified above plus month-of-year. When I predict the total usage out-of-sample for the subsequent nine days, the DOW-included model does slightly better: the difference between predicted and actual usage using the with-DOW model is slightly closer to zero. However, if I predict the total out-of-sample usage for the subsequent 365 days, the difference in prediction accuracy is consistent with zero. The same story holds for month-of-year: if I predict 180 days out, month-of-year adds predictive power for the aggregate sum, but if I predict 365 days out it does not.

Given all this, I am inclined to agree with the proposition that calendar effects are not essential for the daily model, but I'd like to open it up for further discussion.

hshaban commented 6 years ago

Comment by jbackusm Wednesday Mar 15, 2017 at 18:09 GMT


Thanks for this analysis @tplagge ! I mostly agree with your conclusions, but I wanted to mention that overfitting could be a larger problem than you might expect. To the extent that we are interested in savings estimates normalized to a typical year, we run into the problem that we don't know what a typical year looks like in terms of these additional fixed effects.

In fact, we've seen that seasonality in the residuals can vary substantially from one year to the next, based on extreme weather patterns, economic effects, etc. This could explain why you see no benefit to using the month-of-year model when you evaluate residuals in the following 365 days.

Also, this is actually one of the primary reasons why we think it's important to use a comparison group to adjust the gross savings to account for exogenous effects that vary between the pre and post-treatment periods--and we have seen that our adjusted residuals tend to be much more stationary than the unadjusted residuals. I understand that the comparison-group adjustment is out of scope for CalTRACK beta at the moment, but it seemed relevant to mention in this context, so forgive me for taking another opportunity to explain in more detail why we think it's so important.

I also think your approach is interesting and useful: we should be thinking about the dimensions along which we expect the residuals to be stationary, given our immediate use cases. Maybe that's a good way to frame the question of model specification?

hshaban commented 6 years ago

Comment by tplagge Wednesday Mar 15, 2017 at 19:31 GMT


I agree; in addition to looking at effects like the ones I checked, it would be useful to look at residual autocorrelations and possibly long-memory processes. I know there's been plenty of work on using ARMA-style models for daily energy consumption; not that I'm necessarily suggesting we go down that road, but at least worth thinking about.

hshaban commented 6 years ago

Comment by jfarland Tuesday Apr 11, 2017 at 22:19 GMT


I tend to agree with both of you. The more granular the data, the more signals we're going to be able to pick up. My experience with load forecasting applications has convinced me that calendar effects (Day of Week, Month of Year) as well as lagged dependent variables are often the most powerful predictors of energy demand after atmospheric conditions, especially at the hourly and sub-hourly levels. It's not surprising to pick up these signals at the daily level as well.

Another parallel from DNV GL's load forecasting applications that might be interesting here is, if we really are limiting our ultimate concern to annual aggregations/predictions, and we want to "tighten" how our predictions account for calendar effects, we can simply estimate separate models for each month of the year to account for that specific frequency of seasonality.

Just like @tplagge 's ARMA-style model suggestion, this is not something I am necessarily suggesting for the Caltrack Beta Test.

hshaban commented 6 years ago

Comment by tplagge Wednesday Apr 26, 2017 at 18:47 GMT


Calendar effects analysis (with bonus robust regression results)

The Caltrack daily model specification currently includes only heating and cooling degree days as independent variables. While it is anticipated that including calendar effects (day-of-week, month-of-year, etc) might improve the quality of the fits, previous explorations have suggested that the impact on aggregate quantities such as annualized savings is likely to be small. Since these are the only quantities of interest for the specific Caltrack use case, this post will focus on firming up that tentative conclusion.

We will work with the 1000-home electric data set, and will also focus specifically on day-of-week and month-of-year as potential additions to the model. Holidays and interaction terms are also plausible additions, but if the two most plausibly significant additions do not move the needle, then it seems unlikely that these terms would either. Our key metrics will both be related to the out-of-sample prediction, with the 2nd baseline year as the test period and the 1st baseline year as the training period. Our metrics are:

In order to inform our intuition, let’s take a quick look at the results from the model as currently specified. Here are the CV(RMSE) and NMBE histograms:

screen shot 2017-04-25 at 1 54 31 pm screen shot 2017-04-25 at 1 55 33 pm

The CV(RMSE) distribution looks chi-square-ish, and the NMBE distribution looks Cauchy-ish, as you’d expect them to. In fact, here’s the Cauchy fit to the NMBE/fractional savings histogram:

screen shot 2017-04-25 at 1 56 03 pm

The Cauchy distribution here is peaked at 0.017, suggesting a small but detectable bias or population-level trend; the half width at half max is 0.095.

A model that makes better predictions than the one from the specs should have a lower CV(RMSE), a tighter distribution of NMBE/fractional savings, and -- modulo real population-level trends -- an NMBE/peak fractional savings closer to zero.

We will test the following models:

The first thing to note is that for a typical home, these models produce very similar results:

screen shot 2017-04-26 at 11 27 25 am

The day-of-week + month-of-year model, M3, appears a bit different and, at least by eye, overfitted. The rest of the models do not look like they should produce very different results, and indeed they do not.

Results

Below is a plot of the median CV(RMSE) versus median NMBE for each of the models. The error bars represent the 25th and 75th percentile for each distribution. It can immediately be seen that including calendar effects does not dramatically improve the fit; in fact, in some cases it makes matters very slightly worse. Including month-of-year plus day-of-week, for example, increases the median CV(RMS) and NMBE slightly, which is likely due to overfitting. (Note that I did not multiply either CV(RMSE) or NMBE by a factor of 100.)

screen shot 2017-04-26 at 11 23 16 am

Zoomed in:

screen shot 2017-04-26 at 11 23 33 am

In terms of CV(RMSE), the best-performing model is M1, an ols regression including categorical variables for day-of-week. Simply distinguishing between weekdays and weekends gets you most of the way there. Increasing the balance point search range helps a tiny bit.

In terms of NMBE, however, the robust regressions are closer to zero than the OLS regressions, regardless of which calendar effects they include. Here’s what the actual distributions look like for the case of no calendar effects at all:

screen shot 2017-04-25 at 4 29 04 pm

This is likely because usage outliers are more often unusually low than unusually high (vacations, for example). Therefore, a regression less sensitive to outliers tends to produce higher usages. This is a strong argument in favor of robust regression. Note that this bias will be less apparent if we compare modeled quantities from the testing and training periods--then both quantities would be biased, likely similarly. Here we are comparing the training period model to the actual testing period usages.

However, as before, it does not look as if there is a strong motivation to include calendar effects in our daily model for the purposes of calculating annualized values.

This analysis can be taken further. One might suspect that a robust regression using day-of-week would be a promising candidate, and one might also suspect that a regularized regression including day-of-week and month-of-year could be let the power of the elastic net shine a bit brighter.

However, given this group’s well-justified bias in favor of simplicity, my personal opinion is that robust regression with no calendar effects or weekend/weekday only are quite attractive options.

hshaban commented 6 years ago

Comment by tplagge Thursday Apr 27, 2017 at 20:40 GMT


On the call, it was brought up that electricity usage=0 is very likely to indicate some sort of problem. Fortunately, out of the 1000 projects, just 23 had more than one day of 0 usage. Here's the normalized distribution of fractional savings for those 23 projects, versus the other 977. While they do indeed appear to be disproportionally in the tails (particularly the negative tail, as you'd expect), it's not driving the entire bias; the median NMBE for the 977 projects with <=1 zero is 0.0251 versus 0.0246 for the whole sample.

screen shot 2017-04-27 at 3 36 23 pm
hshaban commented 6 years ago

Comment by mcgeeyoung Thursday May 11, 2017 at 04:28 GMT


It seems that we have reached consensus here. I'm going to let @matthewgee or @tplagge add this to the draft spec so I don't flub it. Issue can be closed once we agree on the language.

hshaban commented 6 years ago

Comment by tplagge Thursday May 11, 2017 at 21:43 GMT


Robust linear model benchmarking

I ran a subset of the electrical sample through my full fitting routine using both statsmodels rlm (robust linear model) and ols (ordinary least squares), and the robust model took a little less than a factor of three times longer in both CPU and wall clock time (~4 seconds per home for robust, ~1.5 for OLS on my laptop). Note that the full fitting routine calls fit() many times as part of the CDD/HDD balance point grid search, and does a bunch of other stuff too.

I also run several fits on a single baseline trace without any of the data prep routines--just calling time(smf.ols(data=data, formula=formula)) and time(smf.rlm(data=data, formula=formula))--and found that the difference was just over a factor of 3 (~23 ms for the robust fit, ~7 ms for the OLS fit on my laptop).

Conclusion: The time for the robust/ols fit almost but not completely dominates the total running time, and the computational cost is approximately 3x for robust versus OLS. I'd say this is fairly significant.

Given that robust least squares has a firm theoretical and small-but-non-negligible empirical justification, and that the increase in computational cost is significant but from a low base, I'm very slightly in favor of using robust least squares. It would take little to convince me otherwise.

hshaban commented 6 years ago

Comment by houghb Friday May 12, 2017 at 17:14 GMT


John and I talked more in depth about using the robust linear model this morning. For the reasons outlined below I think we've come around to the conclusion that we're in favor of sticking with OLS as the default approach, but suggest that robust regression should be seriously considered as a future improvement (we're in favor of putting language to this effect in the specs). Here is a summary of our thinking:

hshaban commented 6 years ago

Comment by tplagge Friday May 12, 2017 at 18:50 GMT


You've nudged me over to your side of the fence. I agree with this conclusion: let's call robust regression out as a future improvement but stick with OLS for the specs.

hshaban commented 6 years ago

Comment by houghb Thursday May 18, 2017 at 15:55 GMT


I've added draft language to the analysis specs suggesting robust regression for a future improvement, so am closing this issue.