twitter / AnomalyDetection

Anomaly Detection with R
GNU General Public License v3.0
3.55k stars 776 forks source link

Error in if (data_sigma == 0) break : #58

Open sarojhange opened 9 years ago

sarojhange commented 9 years ago

Below is the sample: weekly data for a metrics. Want to detect anomalies in this time series. The error I encounter while running the code is : Error in if (data_sigma == 0) break : missing value where TRUE/FALSE needed 1 2013-01-01 59.94 2 2013-01-08 59.65 3 2013-01-15 61.56 4 2013-01-22 58.37 5 2013-01-29 58.07 6 2013-02-05 57.31 7 2013-02-12 58.53 8 2013-02-19 63.22 9 2013-02-26 60.21 10 2013-03-05 59.09 11 2013-03-12 57.19 12 2013-03-19 55.97 13 2013-03-26 59.96

datafeelings commented 9 years ago

Wasn't this test made to protect from constant time series? It seems that the function is very sensitive to the % of anomalies that it is supposed to find. Which is good, but should be explained a bit better in the help documentation..

Here is a sample from my time series which is not constant:

1 2015-06-01 00:00:00 1932 2 2015-06-01 01:30:00 857 3 2015-06-01 03:00:00 870 4 2015-06-01 04:30:00 3836 5 2015-06-01 06:00:00 8409 6 2015-06-01 07:30:00 12514 7 2015-06-01 09:00:00 11554 8 2015-06-01 10:30:00 11175 9 2015-06-01 12:00:00 9953 10 2015-06-01 13:30:00 11678 11 2015-06-01 15:00:00 12869 12 2015-06-01 16:30:00 14965 13 2015-06-01 18:00:00 14939 14 2015-06-01 19:30:00 8255 15 2015-06-01 21:00:00 5584 16 2015-06-01 22:30:00 4661

I got the same error as @sarojhange when I called AnomalyDetectionTs() with max_anoms set to 0.1

ans = AnomalyDetectionTs(x = data,
+                          max_anoms = 0.1,
+                          direction = "both",
+                          alpha = 0.05,
+                          longterm = F,
+                          plot = T )
Error in if (data_sigma == 0) break : 
  missing value where TRUE/FALSE needed

But with max_anoms set to 0.01 the function worked fine, and in the plot I could see a correctly identified single anomaly in the data.

ans = AnomalyDetectionTs(x = data,
+                          max_anoms = 0.01,
+                          direction = "both",
+                          alpha = 0.05,
+                          longterm = F,
+                          plot = T )

ans$anoms
            timestamp anoms
1 2015-06-06 16:30:00 29473
sustakat commented 9 years ago

Also having issues with the interactions on this particular code. Seems to run with max_anoms is <10% but will hard break otherwise.

randakar commented 8 years ago

Heh, just ran into the same problem, applying the same fix in the process. This definitely needs a better solution.

carolemieux commented 7 years ago

I had this issue -- the problem in my case was that my timestamps did not have an assigned time zone. this caused an issue in the lines

    R_idx[i] <- data[[1L]][temp_max_idx] 
    data <- data[-which(data[[1L]] == R_idx[i]), ]

of detect_anoms.R, where the comparison data[[1L]] == R_idx[i]) was then failing to return anything, which due to the semantics of which, appears to set data to be the empty dataframe. The failure to return anything is because when there is not an assigned time zone, an earlier assignment to R_idx assigns the time zones to UTC, and somehow all the data was a bit off because of that. Using data[data[[1L]] != R_idx[i], ] instead of the which might fix this problem, but having timestamps in a set time zone is definitely necessary for correctness.