openeemeter / eemeter

An open source python package for implementing and developing standard methods for calculating normalized metered energy consumption and avoided energy use.
http://eemeter.openee.io/
Apache License 2.0
217 stars 67 forks source link

Problem executing tutorial hourly example #415

Closed stvilla closed 3 years ago

stvilla commented 3 years ago

Hi,

trying to reproduce the hourly example found here, I get the following error:

Traceback (most recent call last):
  File "test_hourly.py", line 64, in <module>
    metered_savings_dataframe, error_bands = eemeter.metered_savings(
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/derivatives.py", line 226, in metered_savings
    model_prediction = baseline_model.predict(
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/caltrack/hourly.py", line 191, in predict
    return self.model.predict(prediction_index, temperature_data, **kwargs)
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/segmentation.py", line 221, in predict
    prediction = segment_model.predict(segmented_data) * segmented_data.weight
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/segmentation.py", line 97, in predict
    prediction = design_matrix_granular.dot(parameters).rename(
TypeError: rename() got an unexpected keyword argument 'columns'

The problem seems to be that the rename command is called on a Pandas series instead of a Pandas Dataframe with the keyword argument "columns"

Report installed package versions

pandas==1.1.3 
eemeter==2.10.0

Minimal example

import eemeter

meter_data, temperature_data, sample_metadata = (
    eemeter.load_sample("il-electricity-cdd-hdd-hourly")
)

# the dates if an analysis "blackout" period during which a project was performed.
blackout_start_date = sample_metadata["blackout_start_date"]
blackout_end_date = sample_metadata["blackout_end_date"]

# get meter data suitable for fitting a baseline model
baseline_meter_data, warnings = eemeter.get_baseline_data(
    meter_data, end=blackout_start_date, max_days=365
)

# create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data, temperature_data,
    )
)

# build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'three_month_weighted'
)

# assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# assign temperatures to bins
temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        temperature_bins,
    )
)

# build a CalTRACK hourly model
baseline_model = eemeter.fit_caltrack_hourly_model(
    segmented_design_matrices,
    occupancy_lookup,
    temperature_bins,
)

# get a year of reporting period data
reporting_meter_data, warnings = eemeter.get_reporting_data(
    meter_data, start=blackout_end_date, max_days=365
)

# compute metered savings for the year of the reporting period we've selected
metered_savings_dataframe, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data,
    temperature_data, with_disaggregated=True
)

Thank you!

philngo commented 3 years ago

Thank you @stvilla. I think this must be related to https://github.com/openeemeter/eemeter/pull/408, having to do with pandas version API changes. I'll go ahead and close this issue once we've got that resolved.

stvilla commented 3 years ago

Perfect @philngo. Thank you

leviplj commented 3 years ago

pandas.DataFrame.dot

On Padas docs, if the parameter for the .dot operation is a Series, it'll return a Series.

If other is a Series, return the matrix product between self and other as a Series. If other is a DataFrame or a numpy.array, return the matrix product of self and other in a DataFrame of a np.array.

This is the code where the rename method returns an error. https://github.com/openeemeter/eemeter/blob/ea42fe081f4677351293e49ae1307edf6f527b3b/eemeter/segmentation.py#L96-L99

And this is the line where the parameter is created. https://github.com/openeemeter/eemeter/blob/ea42fe081f4677351293e49ae1307edf6f527b3b/eemeter/segmentation.py#L75

The parameter is created as a Series, so the result will also be a Series. In this case the rename method should have only one positional parameter with the name. It should be something like this:

        # Step 3, predict
        prediction = design_matrix_granular.dot(parameters).rename("predicted_usage")

I've done this change and my code worked. Only issue is that I'm not sure if the result is what was intended before when the code used to work.

ssuffian commented 3 years ago

Letting you know @stvilla and @jpvelez that we merged in the PR and the newest eemeter version should now work with the new pandas!

stvilla commented 3 years ago

Thank you!