Open JrtPec opened 8 years ago
Can I use the scikit-learn library for it?
Yes! I was thinking about a RANSAC-algorithm.
I have implemented it before, so I might have some code snippets that could be used to start.
This should get you started!
import numpy as np
from sklearn import linear_model
x = [] # list with x-values
y = [] # list with y-values
# make sure that x and y have the same size!
model_ransac = linear_model.RANSACRegressor(linear_model.LinearRegression())
model_ransac.fit(x, y)
# use inlier_mask_ to get a boolean series of the inliers
inliers = model_ransac.inlier_mask_
outliers = np.logical_not(inliers)
You can also use model_ransac.predict()
to calculate expected values!
My expectation is that if you would apply this method to the image posted above, those two outlying red dot's would be detected pretty easily.
@saroele JrtPec: Maybe also kick out FL03001550 from the analysis? This seriously skews the analysis/mean. I'm convinced this is not a residential building (an office?) as it has a night/weekend consumption of 5kW and week-day consumption of 15-20kW.
yes, this is an office :-)
With a basic categorisation approach we should be able to select only dwellings or only offices for a specific analysis. That´s the idea of the SAREF or Haystack implementations (see also #116).
On Fri, Dec 9, 2016 at 9:06 AM, J. Ver. notifications@github.com wrote:
@saroele https://github.com/saroele JrtPec: Maybe also kick out FL03001550 from the analysis? This seriously skews the analysis/mean. I'm convinced this is not a residential building (an office?) as it has a night/weekend consumption of 5kW and week-day consumption of 15-20kW.
https://opengrid.be/sensor/565de0a7dc64d8370aa321491217b85f
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opengridcc/opengrid/issues/148#issuecomment-265956665, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiVBkNM7KxHmgKxSwhS2kheJ6fpVWhCks5rGQwFgaJpZM4KRtbc .
It wouldn't be too hard to add a "building type" argument to the sites in the houseprint, both in the GDocs file as in the code + parser. I would like to see what naming conventions SAREF uses, but other than that it's pretty straightforward.
We can easily develop an algorithm to detect these kind of outliers (I'm thinking a ransac regressor)