openeemeter / caltrack

Shared repository for documentation and testing of CalTRACK methods
http://docs.caltrack.org
Creative Commons Zero v1.0 Universal
56 stars 14 forks source link

FINAL CALL FOR COMMENTS: Monthly and Daily Methods Updates #82

Closed hshaban closed 6 years ago

hshaban commented 6 years ago

Site Model Selection Criteria #76

Background: In Caltrack 1.0, a number of candidate models were fit to data and the model with the maximum adjusted R-squared was selected as the best fit model for second-stage savings estimation. Only models with coefficients that were significant at the 10% significance level were used in model selection. Caltrack 2.0 tests indicated that the p-value screen did not result in markedly improved performance and may even be detrimental in some cases.

Updates in Caltrack 2.0: The p-value qualification criterion will be removed from the monthly and daily methods in the specification. The following lines will be edited to remove the p-value criterion:

hshaban commented 6 years ago

Maximum Baseline Period Length #68

Background: Caltrack 1.0 included data sufficiency guidelines for minimum baseline period length but did not specify the maximum baseline period length. Caltrack 2.0 tests demonstrated that prediction results and savings are unstable depending on the baseline period length used for modeling.

Updates in Caltrack 2.0: A guideline will be added to the data preparation section of Caltrack to limit the baseline period length to 12 months. The most recent 12 months prior to an intervention are considered the most representative of a building’s energy usage for the purpose of calculating payable savings.

hshaban commented 6 years ago

Variable Degree Days for Billing Period Models #69

Background: Caltrack 1.0 monthly models used fixed balance point temperatures (60 and 70 F, for heating and cooling, resp.). Caltrack 2.0 test results indicated that fixed balance points yielded more intercept-only models, suggesting that the fixed balance point did not effectively approximate the site’s optimal balance point. Additionally, variable balance point models resulted in improved out of sample model performance over fixed balance point models.

Updates in Caltrack 2.0: CalTrack 2.0 will employ a grid search procedure to identify the balance points on HDD and CDD that result in the best model fit for billing period methods with the following ranges: HDD: 40-80 F CDD: 50-90 F The relevant section in the documentation will be updated.

hshaban commented 6 years ago

Expanded Balance Point Grid Search Range #72

Background: Default Caltrack 1.0 balance point grid search parameters (heating: 55-65 F, cooling: 66-75 F) may be restrictive in more extreme climates and for different building types. Different grid search ranges were compared empirically and no significant effect on model performance was identified. However, wider search ranges improve model interpretability.

Updates in Caltrack 2.0: The grid search parameters on degree day coefficients will be expanded to:

In addition, search increments of up to 3 F will be permitted. Relevant sections in the documentation will be updated.

hshaban commented 6 years ago

Method for Modelling Using Billing Data and Weighted Least Squares #67

Background: Modeling using billing data was underspecified in Caltrack 1.0, when using billing data.

Updates in Caltrack 2.0: The following method for using billing data be added to the Caltrack specification.

hshaban commented 6 years ago

Disambiguate the term “Month” #78

In the documentation, instances of the term “month” will be clarified as described below:

hshaban commented 6 years ago

Clearer Distinction Between Official Data Sufficiency Requirements and Test Data Preparation #79

In documentation, some methods apply exclusively to test data. These should be listed clearly as applying to the test data only. General data sufficiency requirements will be introduced in a separate General Data Sufficiency section.

hshaban commented 6 years ago

Specificity in Data Sufficiency Requirements Documentation #66 #80

Data sufficiency requirements should clearly define the data they apply to. If a data sufficiency requirement applies to baseline data only, this should be clearly indicated. If a data sufficiency requirement applies to baseline and reporting period data, both of these applications should be stated.

In addition, data sufficiency using billing data will be clarified as follows for the baseline period: All qualifying sites must have 12 months of contiguous UPDm values, with up to 1 missing value, in the pre-intervention time series

When referring to annualized savings for the reporting period, the following language will be used: All qualifying sites must have 12 months of contiguous UPDm values, with up to 1 missing value, in the post-intervention time series

NCIHVAC commented 6 years ago

I apologize for lobbing in this comment as a mere casual observer rather than somebody who's been actively participating in this conversation. To support commercial uses in the daily methods it may be quite helpful to provide for various groupings of days of the week prior to creating regressions, I've heard this referred to as "day-typing". You could do this several ways. One way would be to run test regressions with various day-type schemes implying different underlying building schedules, then select the scheme that produces the best overall results (another conversation) to build the final regression.

philngo commented 6 years ago

@NCIHVAC I believe this particular github issue (#82) is for final discussion regarding a set of already-tested changes to the methods. I've opened a new issue (#83) to capture the conversation about day-typing and the testing for that proposal. Don't want it to get lost here.

philngo commented 6 years ago

Weather station selection criteria should include guidance for site as well as zip code centroid and a fallback option in case the first weather station fails because of data sufficiency #65

Background: Building site location is necessary for determining suitable weather station match. Previously, methods recommended using ZIP codes.

Updates in CalTRACK 2.0: Determining a building site location: Site location should be specified by geocoded lat/long when geocode is given with reasonable accuracy. When not available, fallback can be to use lat/long coordinate of ZIP Code Tabulation Area (ZCTA) centroid. ZCTA is determined by matching id of ZIP code, but such a match may not always be available.

philngo commented 6 years ago

Determining an appropriate weather station for building site temperature models. #65

Background:

CalTRACK 1 did not test or recommend how match sites to weather stations.

Updates in CalTRACK 2.0 Ground-truth testing in #65 indicated CalTRACK should recommend restricting to closest within climate zone (Method A) to find similar weather data, but falling back to a naively closest weather station (Method B) is acceptable if data sufficiency is not met with first choice weather station. See #65 for additional descriptions of methods.

philngo commented 6 years ago

Updated data sufficiency guidelines

Background:

Caltrack 1 data sufficiency guidelines were mixed in with testing guidelines and did not address temperature data, so these needed clarification

Updates in CalTRACK 2.0:

Propose adding the following language regarding data sufficiency:

The following criteria must be met for meter data and temperature data to be considered sufficient for use with CalTRACK daily and billing methods.

  1. For fitting baseline models, a data coverage period of 365 days of data immediately prior to and not extending into intervention period is recommended.
  2. Meter data used to fit either a baseline or reporting model cannot come from meters with net metering or negative usage values.
  3. Sufficient meter and temperature roll-ups for 90% of days of the coverage period is considered sufficient total coverage.
  4. A daily meter data period roll up is considered sufficient under the following conditions: a. If summing to total period usage from higher frequency interval data, no more than 50% of values should be missing, and missing values should be filled in with average of non-missing values (e.g., for hourly data, 24 * average hourly usage). For electricity, but not gas data, values of 0 are considered missing. b. Although this is rare in interval data, if periods are estimated they should be combined with previous periods. c. If averaging from higher frequency temperature data, no more than 50% of temperature values should be missing. d. If using daily temperature data, the care should be taken to ensure daily periods match the meter daily periods. (e.g., if meter data is given with a US/Pacific midnight, temperature data should use the same midnight, not a UTC midnight).
  5. A metered data billing period roll up is considered sufficient under the following conditions: a. Estimated periods should be combined with previous period. b. When averaging from higher frequency temperature data, temperature data must cover 90% of each period. c. Off-cycle reads (typically spanning less than 25 days) should be combined with previous period reads. These readings typically occur due to meter reading problems or changes in occupancy.
  6. Data is considered missing if it is clearly marked by the data provider as NULL, NaN, or similar.