opengridcc / opengrid-dev

Open source building monitoring, analysis and control
Apache License 2.0
26 stars 21 forks source link

Detect outliers in the standby consumption #148

Open JrtPec opened 7 years ago

JrtPec commented 7 years ago

We can easily develop an algorithm to detect these kind of outliers (I'm thinking a ransac regressor) standby_vertical_a7d523e79909ad098bccc337c5da84a0

GMathyssen commented 7 years ago

Can I use the scikit-learn library for it?

JrtPec commented 7 years ago

Yes! I was thinking about a RANSAC-algorithm.

I have implemented it before, so I might have some code snippets that could be used to start.

JrtPec commented 7 years ago

This should get you started!

import numpy as np
from sklearn import linear_model

x = []  # list with x-values
y = []  # list with y-values
# make sure that x and y have the same size!

model_ransac = linear_model.RANSACRegressor(linear_model.LinearRegression())
model_ransac.fit(x, y)

# use inlier_mask_ to get a boolean series of the inliers
inliers = model_ransac.inlier_mask_
outliers = np.logical_not(inliers)

You can also use model_ransac.predict() to calculate expected values!

My expectation is that if you would apply this method to the image posted above, those two outlying red dot's would be detected pretty easily.

Ryton commented 7 years ago

@saroele JrtPec: Maybe also kick out FL03001550 from the analysis? This seriously skews the analysis/mean. I'm convinced this is not a residential building (an office?) as it has a night/weekend consumption of 5kW and week-day consumption of 15-20kW.

https://opengrid.be/sensor/565de0a7dc64d8370aa321491217b85f

saroele commented 7 years ago

yes, this is an office :-)

With a basic categorisation approach we should be able to select only dwellings or only offices for a specific analysis. That´s the idea of the SAREF or Haystack implementations (see also #116).

On Fri, Dec 9, 2016 at 9:06 AM, J. Ver. notifications@github.com wrote:

@saroele https://github.com/saroele JrtPec: Maybe also kick out FL03001550 from the analysis? This seriously skews the analysis/mean. I'm convinced this is not a residential building (an office?) as it has a night/weekend consumption of 5kW and week-day consumption of 15-20kW.

https://opengrid.be/sensor/565de0a7dc64d8370aa321491217b85f

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opengridcc/opengrid/issues/148#issuecomment-265956665, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiVBkNM7KxHmgKxSwhS2kheJ6fpVWhCks5rGQwFgaJpZM4KRtbc .

JrtPec commented 7 years ago

It wouldn't be too hard to add a "building type" argument to the sites in the houseprint, both in the GDocs file as in the code + parser. I would like to see what naming conventions SAREF uses, but other than that it's pretty straightforward.