opengridcc / opengrid-dev

Open source building monitoring, analysis and control
Apache License 2.0
26 stars 21 forks source link

Regression analysis #158

Closed JrtPec closed 7 years ago

JrtPec commented 7 years ago

Partyparty, regression demo time

Ryton commented 7 years ago

Looking good!

One remark though:

Opengrid is quite unique in having sub-daily data, so we better make full use of it!

JrtPec commented 7 years ago

That's the point! This is an exact port of the code that is already implemented on EnergyID, so using it as-is wouldn't be adding much value. We should try and see if we can draw useful conclusions from weekly, daily, hourly data... try to use different independent variables etc.

Do you have a formula for degree hours? I'd like to implement that.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.3%) to 70.448% when pulling bb8f01335769be94236c5c92182326d8fec5e7bd on JrtPec:regression-analysis into 167982b28476eaceb5e3e809f60333fa4015fed5 on opengridcc:develop.

saroele commented 7 years ago

I think going to sub-daily regression is no free lunch. Dynamic effects come into play, and your simple linear model will not work anymore. A dynamic model (eg. RC analogy) is needed.

I think we can still do a lot on daily/weekly resolution before we move to dynamic models.

Ryton commented 7 years ago

I constructively disagree, @saroele . ;-)

I do agree that, when naively going to below daily level (e.g. to hourly), new dynamics should be considered, but even now, at daily level you'll have non-modeled weekday vs weekend dynamics (in setpoints/comfort, domestic hot water usage, etc).

But, by grouping by well-chosen, sub-daily periods instead, e.g.:

Weekly / daily could be interesting as an exercise and input towards EnergyID, but my personal interest/focus is on the smaller timescale.

But no worries, feel free follow your heart & nose, we'll see where it leads us :-)

Ryton commented 7 years ago

An example of Degree Hour calculation and an application to residential energy usage can be found here: There are a few papers on how to apply degree hours in the url's below. The formula you want is on p3 of http://amet-me.mnsu.edu/UserFilesShared/SolarWall/Degree%20Day%20Analysis/An%20application%20of%20the%20degree-hours%20method%20to%20estimate%20the%20residential%20heating%20energy%20requirement%20and%20fuel%20consumption%20in%20Istanbul%20-%20Durmayaz,%20Kadioglu%20and%20Sen.pdf

It's basically the same as degree days (using historic mean temperature as a reference), but split up (averaged/summed) per hour instead of per day.

saroele commented 7 years ago

Let's not forget that an energy signature is very simple linear and static model and the definition of degree-days is specifically tuned to enhance the linearity. As you remark correctly, when we go to lower resolutions (below weekly I would say), we'll have to select similar days in order to get a good regression. In my phd, I selected only tue-fri and got sufficiently good correlations (for an office).

If you go to blocks of 4-6h like you propose, you'll have to find good metrics to find similar blocks, and the dynamics come into play. The gas consumption during a 6h evening block depends on what happened before: was it cold, sunny, building occupied, building heated, etc ? So if your model does not take these into account, you'll get a lot of shattering of the points, and you get a cloud instead of a line.

Do you think you can create a useful model? Useful in the sense that it can learn us something about the building or that we can use it to detect anomalies? I can only encourage you to try and prove me wrong :-)

Ryton commented 7 years ago

@saroele
i failed to prove you wrong and therefore, QED, you are proven right (until further notice)! :-P

Based on some quick hacks with available data & degree hours/days, the resulting point clouds indeed shows quite a lot of variation, even when using 'just' daily averaged data.

Therefore, a weekly or half-weekly consumption of heat generation equipment is probably a safer way to generate robust, non-supervised energy signatures.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.3%) to 70.23% when pulling 787750a10f05ac8b3aa43e97a40f0e58ca995026 on JrtPec:regression-analysis into 0b2a15ec15ed912a20870e89dfb7187336970186 on opengridcc:develop.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.3%) to 70.23% when pulling 787750a10f05ac8b3aa43e97a40f0e58ca995026 on JrtPec:regression-analysis into 0b2a15ec15ed912a20870e89dfb7187336970186 on opengridcc:develop.