twitter / AnomalyDetection

Anomaly Detection with R
GNU General Public License v3.0
3.56k stars 779 forks source link

detect_anoms erroneously reports at least one anomaly, regardless of data #24

Closed andrewclegg closed 9 years ago

andrewclegg commented 9 years ago

I think I've found a bug in detect_anoms.

Before the main loop, num_anoms is initialized to 0.

At the end of each iteration, you update num_anoms if R is greater than lambda.

Then after the loop, you return R_idx[1L:num_anoms].

So if no elements made R exceed lambda, the return value works out to R_idx[1L:0L]. But this range subscript gives you the first element, not an empty vector:

> foo = c(4,5,6,7)
> foo[1:0]
[1] 4
>

So won't it always report the most extreme value as an outlier, no matter what data you give it? (Of course the user won't see this if they've set a threshold in AnomalyDetection, but they might not do that...)

jhochenbaum commented 9 years ago

Sounds like a bug to me! Owen and I are looking into this tonight and we'll issue a patch, thanks again and nice detective work.

owenvallis commented 9 years ago

Hi Andrew,

Thanks for the heads up. We now set R_idx to NULL if there are 0 anoms detected.

andrewclegg commented 9 years ago

Thanks for the quick response!

jhochenbaum commented 9 years ago

Cheers!