openeemeter / eemeter

An open source python package for implementing and developing standard methods for calculating normalized metered energy consumption and avoided energy use.
http://eemeter.openee.io/
Apache License 2.0
215 stars 67 forks source link

Segmentation with holidays #480

Open khosravym opened 7 months ago

khosravym commented 7 months ago

Energy usage in buildings typically varies on holidays compared to weekends or other weekday-hour brackets. The segmentation allows us to easily define a new map, like the following example, and segregate holiday data from the rest. This enhances the regression accuracy through more precise occupancy bins. However, one challenge is the number of data points in the holiday segment, which is necessary to prevent overfitting due to the number of independent variables (such as 168 weekday-hours, temperature bins, etc.). I would recommend to update _segmentweights... and segment_time_series functions of segmentation.py to include holidays.

"three_month_weighted": { "jan": "dec-jan-feb-weighted", "feb": "jan-feb-mar-weighted", "mar": "feb-mar-apr-weighted", "apr": "mar-apr-may-weighted", "may": "apr-may-jun-weighted", "jun": "may-jun-jul-weighted", "jul": "jun-jul-aug-weighted", "aug": "jul-aug-sep-weighted", "sep": "aug-sep-oct-weighted", "oct": "sep-oct-nov-weighted", "nov": "oct-nov-dec-weighted", "dec": "nov-dec-jan-weighted", "holiday": "holiday", },

travis-recurve commented 6 months ago

The new hourly model does not work quite like this.

The general idea of using holidays is good, but it's also complicated because every country and even regions within countries have their own unique holidays. There are python packages to help with this and we are considering adding into the new hourly model.