openeemeter / eemeter

An open source python package for implementing and developing standard methods for calculating normalized metered energy consumption and avoided energy use.
http://eemeter.openee.io/
Apache License 2.0
217 stars 67 forks source link

Extracting Confidence Interval of fitted regression models in "CalTRACK Hourly method" with a “one_month” setting #411

Closed alihabibikhalaj closed 3 years ago

alihabibikhalaj commented 4 years ago

I am using OpenEE open source code to measure the energy efficiency of intervention and my client is asking for Confidence Interval (CI) of the fitted regression model.

I am running the eemeter with hourly meter and temperature data sets using "CalTRACK Hourly method" with a “one_month” setting (one regression model for each month or 12 models in total).

Can you please show me how to extract the Confidence Interval of the model for each model?

This is the core code I am using to do "CalTRACK Hourly method" with a “one_month”:

# Get meter data suitable for fitting a baseline model
baseline_end_date_hr = min(meter_data.index) + pd.Timedelta(days=365)
baseline_meter_data_hr, warnings = eemeter.get_baseline_data(
    meter_data, end=baseline_end_date_hr, max_days=365
    )

# Create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data_hr, temperature_data,
        )
    )

# Build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'one_month',
    )

# Assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
    )

# Assign temperatures to bins
temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
    )

# Build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        temperature_bins,
        )
    )

# BEGIN NEW CODE for fitting baseline model - example of using SegmentedModel
# directly with modified segment type. CalTRACKHourlyModel is a very thin wrapper
# around SegmentedModel, which is why this works
segment_models = [
    eemeter.fit_caltrack_hourly_model_segment(segment_name, segment_data)
    for segment_name, segment_data in segmented_design_matrices.items()
    ]

# Fit a CalTRACK hourly model
baseline_model_hr = eemeter.SegmentedModel(
    prediction_segment_type="one_month",
    prediction_segment_name_mapping=None,
    segment_models=segment_models,
    prediction_feature_processor=eemeter.caltrack_hourly_prediction_feature_processor,
    prediction_feature_processor_kwargs={
        "occupancy_lookup": occupancy_lookup,
        "temperature_bins": temperature_bins,
        },
            )

# END NEW CODE

# Get a year of reporting period data
reporting_meter_data_hr, warnings_hr = eemeter.get_reporting_data(
    meter_data, start=baseline_end_date_hr, max_days=(455)
    )
warnings_hr

# Compute metered savings for the year of the reporting period we've selected
metered_savings_hr, error_bands_hr = eemeter.metered_savings(
    baseline_model_hr, reporting_meter_data_hr,
    temperature_data, confidence_level=0.90, with_disaggregated=True
    )
error_bands_hr
metered_savings_hr.metered_savings.plot()
philngo commented 4 years ago

Hi @alihabibikhalaj - The CalTRACK methods do not document a method for computing confidence intervals for the hourly methods (I believe this is something that would be addressed in future working group sessions - the problem now stems from how to address the high auto-correlation between hourly time series data points). I generally fall back on creating a separate daily model on a daily version of the same data, for which CalTRACK documents (and the OpenEEmeter implements) a method for computing error bands. Note that CalTRACK error bands for the daily model are not strictly confidence intervals per se, but it they are a representation of the statistical uncertainty.

alihabibikhalaj commented 4 years ago

Many thanks Phil for your update and suggestion. That would be really great if you can document a method for computing confidence intervals for the hourly methods as soon as you can, as I need to provide it to my client.