Closed breznak closed 9 years ago
@subutai This has been on my mind for a while, I'd love to hear your thought on it! :question:
NAB already includes a few artificial datasets, some of which fall into your classes above. I think it is fine to create some more elsewhere (i.e. another repo) that are NAB compatible, but I don't really want to add them into the formal benchmark. I want to focus NAB mostly on real world data and would ideally like to even get rid of the existing artificial datasets. There are lots of other anomaly benchmarks with artificial data.
I want to focus NAB mostly on real world data and would ideally like to even get rid of the existing artificial datasets.
Revisiting this. Your decision sounds fair, I'll setup a NAB.synthetic
.
👍
Define, create and include synthetic datasets for different kinds of anomalies. This is important for regressions, as the simple data can stress (at different difficulties) certain properties of HTM. I will also help to define concrete advantages and weak spots of HTM.
A
, then num 121000 isB
instead of A.We want to detect all as anomalies, but we may want to differentiate among them. An examples is in the ECG MIT-BIH data, where there are _V_etricular anomalies (easy) and about 4 more types. This somewhat combines anomaly detection with
classification of sequences
.For example temperature. Measured every morning, 7am I get a relatively stable, slow changing pattern; measured every hour I get stable pattern with significant changes; measured every 7h it looks like random data.
So the question is, how can HTM "decide" optimal aggregation, focus scale? An example, GPS position reported every second, how do you scale?
Should all of these be part of one HTM/anomaly model? Or run as an ensemble of specific models?
Yes, it's an oxymoron, but everybody wants it! :icecream: I think this is a core problem, my ideas include running combination HTM of different HTM models (with different scale) ...