impactlab / caltrack

Shared repository for documentation and testing of CalTRACK methods
http://docs.caltrack.org
6 stars 5 forks source link

Proposal for outlier detection and removal #63

Closed houghb closed 7 years ago

houghb commented 7 years ago

On the call last week we acknowledged significant outliers in the testing dataset and these outliers are making our output statistics less useful.

We explored some different ways to identify outliers during the model selection process (where we train our models on one year of pre-treatment data, then test the model performance on a second year of pre-treatment data). The outlier detection approach we are proposing can also be used in the final specs to remove outliers from weather normalized savings estimates.

Here are the different approaches that we considered with some notes:

  1. An outlier is a premise where the annual usage changes >30% from the training pre-treatment year to the testing pre-treatment year

    • This works at identifying the outliers in the model selection process, but it is not readily extensible to the weather normalized savings so we discarded it.
  2. An outlier is a premise where the absolute value of the fractional savings is greater than 0.75

    • Fractional savings is the same as in our output metrics: (predicted_daily_use.sum() - daily_use.sum()) / daily_use.sum()
    • In this case premises whose annual usage changes more than 75% from what it is predicted to be are thrown out as bad estimates. For the model selection process this makes sense because the only difference between the two years of data should be weather, and weather alone should not change usage by 75%. For the weather normalized savings this cutoff value also makes sense because we don't expect to see real savings of 75% after a project.
    • With a cutoff value of 0.75 this drops less than 5% of premises from the electric or gas results. Using a cutoff value of 1.0 works alright too and drops only 3%, but leaves some significant outliers.
  3. An outlier is a premise with a fractional savings value in the top or bottom X percentile of the results

    • This approach works fine, but doesn't do quite as well at moving the median and means closer together as approach No. 2.
    • We would probably want to tune the percent based on what the actual results look like for each different set of results.
    • X = 5% drops 10% of premises from the electric results
      X = 2% drops 4% of premises from the electric results
      X = 1% drops 2% of premises from the electric results
  4. An outlier is a premise with fractional savings more than X standard deviations away from the median

    • For X = 1.5 and X = 2 this only drops a single premise so doesn't meet our needs.

Recommendation

The two viable approaches to outlier detection we explored (No. 2 and No. 3 above) each attempt to do different things:

Approach No. 3 requires a distribution of results to determine what the outliers are, and it is possible in this scenario for a premise to be dropped as an outlier when it is part of one subsample of the available premises, but if you re-run the analysis with a larger/smaller/different subset of premises it might not be an outlier anymore.

In contrast, approach No. 2 can be applied at the premise level and determines for each premise whether our estimate is "good" or not based on the logic that we should not see more than 75% savings. Something identified as a poor estimate under this approach will always be discarded, even if additional premises are run.

Essentially these two approaches are for different things -- No. 3 identifies true outliers in some distribution, No. 2 identifies bad estimates. We are recommending No. 2 because it hits several birds with the same stone: getting rid of outliers, but also filtering out bad models before aggregation.

To make sure I'm clear, I am proposing that we add the following to the analysis spec: _"Remove premises for which the absolute value of the fractional savings is greater than 0.75, where fractional savings is defined as (total_annual_predicted_use - total_annual_actual_use) / total_annual_actual_use"_

Output comparison for electric (before and after removing outliers)

ModelID Model Description Climate Zone Number of Sites in Group Mean daily use (training) SD in daily use (training) Mean daily use (testing) SD in daily use (testing) Mean heating balance point Mean cooling balance point Mean CVRMSE Mean NMBE Median CVRMSE Median NMBE
1 Current spec with outliers removed 16 2 33.36304658 20.45473949 32.44178836 20.48904894 59 69.5 29.66660401 -9.393668691 29.66660401 -9.393668691
1 Current spec with outliers removed 2 79 15.81895657 12.04616068 15.65663614 10.45600452 59.22727273 67.08108108 40.52787925 -0.0624178 36.05715704 1.591909755
1 Current spec with outliers removed 3 186 14.37042834 9.424762735 14.17563148 9.413943469 60.28025478 67.11111111 44.18369412 -0.117802197 36.85247073 0.309592477
1 Current spec with outliers removed 4 82 18.44725684 11.58791841 18.27547939 11.19745294 59.80882353 67.35087719 36.78670718 -0.040641884 34.7236992 1.007863796
1 Current spec with outliers removed 5 16 17.73247106 8.286043853 17.0151938 8.461434828 61.84615385 68 26.76734401 -4.350611209 24.07013201 -3.00739748
1 Current spec with outliers removed 11 52 30.80566099 19.62299251 30.11455837 19.676095 60.07894737 69.36538462 36.31790551 -1.684012074 33.02752803 -3.808143451
1 Current spec with outliers removed 12 360 23.57038212 16.18791723 23.20346066 16.08652957 58.316 67.721875 42.11296887 -2.193839622 37.36518505 -1.667744641
1 Current spec with outliers removed 13 118 33.07237532 22.30305605 32.33749649 21.5743705 60.65217391 71.1440678 36.15120211 -4.359162633 30.97849501 -3.436827338
ModelID Model Description Climate Zone Number of Sites in Group Mean daily use (training) SD in daily use (training) Mean daily use (testing) SD in daily use (testing) Mean heating balance point Mean cooling balance point Mean CVRMSE Mean NMBE Median CVRMSE Median NMBE
1 Current spec 16 2 33.36304658 20.45473949 32.44178836 20.48904894 59 69.5 29.66660401 -9.393668691 29.66660401 -9.393668691
1 Current spec 2 91 16.77561635 13.50253338 15.31266112 10.56994922 59.24 67.04878049 46.25520309 -6.323619551 38.5635312 -1.530476594
1 Current spec 3 216 14.5414926 10.11924684 13.49545309 9.541706383 60.27683616 66.92 166.5966534 -123.4805673 38.42499052 -1.225046386
1 Current spec 4 90 18.26733008 11.60227117 17.66383191 11.21901263 60.06756757 67.40983607 39.49459982 -0.663687944 35.27508383 1.007863796
1 Current spec 5 17 17.4686972 8.197565129 16.70343116 8.416398823 62.07142857 69.16666667 26.76734401 -4.350611209 24.07013201 -3.00739748
1 Current spec 11 59 30.70871162 20.80840078 30.67601254 23.4510035 60.325 69.33333333 40.38655729 -6.11977188 33.12688587 -4.3478681
1 Current spec 12 389 23.49813289 16.10049864 22.68430358 16.10464327 58.3129771 67.64117647 46.43881721 -6.711320706 37.64979899 -1.8570027
1 Current spec 13 136 32.08717042 22.1922064 30.54500668 21.9339118 60.81372549 71.18796992 44.78832609 -10.33473553 32.19374289 -3.517148987
mcgeeyoung commented 7 years ago

@houghb This is a really interesting writeup. I wonder if you could briefly enumerate your proposed use cases for this approach (#2). Are you suggesting that we use it to cull our 1,000 meter dataset? Or are you suggesting a broader application of this technique?

houghb commented 7 years ago

I am proposing that we remove the premises identified above any time we use our models to make predictions. I don't think we should cull the 1000 home dataset, but before reporting any summary or output statistics we should remove these premises from the set of results. We would do the same when generating weather normalized savings estimates before we enter the aggregation steps.

mcgeeyoung commented 7 years ago

@houghb I would feel more comfortable putting this in as a recommendation under the Aggregation section. In both your proposal above and in our aggregation recommendations, we are providing guidance as to how to deal with the effects of outliers, or non-standard distributions. However, we shouldn't not report (i.e., censor) the outputs. But rather we should bring attention to them and suggest good methods for handling them (as in above).

mcgeeyoung commented 7 years ago

Closing this now that a final recommendation has been made on aggregation.