twitter / AnomalyDetection

Anomaly Detection with R
GNU General Public License v3.0
3.55k stars 776 forks source link

Error Messages from AnomalyDetection: R_idx <- data[[1]][temp_max_id] and if(R > lam) #18

Open colinglaes opened 9 years ago

colinglaes commented 9 years ago

Hey guys,

I've ran some numbers through the library and run into a few issues.

When i was running a large amount of datasets through the package I continued being presented the error message below referencing line 89 of "detect_anoms.R"

Error in R_idx[i] <- data[[1]][temp_max_idx] : replacement has length zero

I noticed that the datasets which tripped the error seemed to have near constant time-series datasets with a low number of unique values (i.e. constant 0's for 1 month constant 1's for 2 months). so I set a minimum unique value for the dataset to get around it (I started at one and went up to nine). This allowed more datasets to get through but I eventually ran into the below error message referencing line 101 when i set the minimum unique value at 9.

Error in if (R > lam) num_anoms <- i : missing value where TRUE/FALSE needed

I successfully ran all my datasets after setting the minimum unique value at 10, however i would like to know whether or not it is possible to run the package without this unique value threshold.

Thanks!

jhochenbaum commented 9 years ago

@colinglaes Do you have a dataset you could share with for us to test with? It would be useful to see the data you're working with to make sure we're robust against it...

Also, you might be interested in a complementary package called Breakout which can detect mean-shift and ramp-up.

https://github.com/twitter/BreakoutDetection https://blog.twitter.com/2014/breakout-detection-in-the-wild

colinglaes commented 9 years ago

@jhochenbaum heres a link to the file, http://www.filedropper.com/sampledata, let me know whether or not you're able to retrieve the file

colinglaes commented 9 years ago

Hey, not trying to be a pest, just wondering if you've made any progress on this issue, or if you possibly need more data sets with more than just one unique data value.

akejariwal commented 9 years ago

@colinglaes The second column of the sampledata contains only '2's. So, there are no anomalies. Can you please provide an alternative data set.

cc @jhochenbaum @owenvallis

colinglaes commented 9 years ago

Yeah I'm aware of that, its just that I run through a fairly large number of datasets and thats bound to happen for some, i guess ill just use exception handling to get around that. Below is a link to a few more data sets. The number at the end of each csv corresponds to the number of unique data values there are.

http://s000.tinyupload.com/?file_id=36312528396867860629

Thanks!

@akejariwal @jhochenbaum @owenvallis

owenvallis commented 9 years ago

I think we could certainly add the exception into the code to avoid the errors. Line 83 might be a good spot. If we don't have any potential anoms stored in R, then we should return before running the Grubbs test. However, if there are at least a few unique values, then 101 should return a truth value, so we'll have to look into that.

colinglaes commented 9 years ago

awesome thanks!