twitter / AnomalyDetection

Anomaly Detection with R
GNU General Public License v3.0
3.55k stars 776 forks source link

data.frame Column Error #42

Open stevebanik opened 9 years ago

stevebanik commented 9 years ago

I created a data.frame called foo and attempted to format it exactly like raw_data, but when I set res, I get an error.

My data.frame:

head(foo) timestamp count 1 2015-05-11 13:54:00 42748.0 2 2015-05-11 13:55:00 44152.0 3 2015-05-11 13:56:00 43642.0 4 2015-05-11 13:57:00 42544.0 5 2015-05-11 13:58:00 41627.0 6 2015-05-11 13:59:00 42138.0

Setting res, getting an error:

res = AnomalyDetectionTs(foo, max_anoms=0.02, direction='both', plot=TRUE) Error in AnomalyDetectionTs(foo, max_anoms = 0.02, direction = "both", : data must be a 2 column data.frame, with the first column being a set of timestamps, and the second coloumn being numeric values.

raw_data looks quite like foo:

head(raw_data) timestamp count 1 1980-09-25 14:01:00 182.478 2 1980-09-25 14:02:00 176.231 3 1980-09-25 14:03:00 183.917 4 1980-09-25 14:04:00 177.798 5 1980-09-25 14:05:00 165.469 6 1980-09-25 14:06:00 181.878

Any idea what I'm doing wrong?

Thanks,

Steve

stevebanik commented 9 years ago

UPDATE: Value should likely be num, not chr:

str(foo) 'data.frame': 1439 obs. of 2 variables: $ date : POSIXct, format: "2015-05-11 14:20:00" "2015-05-11 14:21:00" ... $ value: chr "36185.0" "38591.0" "36313.0" "34467.0" ...

I used transform to change that:

D <- transform(foo, value = as.numeric(value)) Warning message: In eval(expr, envir, enclos) : NAs introduced by coercion

And now it's num:

str(D) 'data.frame': 1439 obs. of 2 variables: $ date : POSIXct, format: "2015-05-11 14:20:00" "2015-05-11 14:21:00" ... $ value: num 36185 38591 36313 34467 35717 ...

but "Anom detection needs at least 2 periods worth of data":

anomalyDetectionResult <- AnomalyDetectionTs(D, max_anoms=0.2, threshold = "None", direction='both', plot=TRUE, only_last = "day", e_value = TRUE) Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period, : Anom detection needs at least 2 periods worth of data

I seem to recall reading another issue about that, so I'll look for it again.

nullbuddy1243 commented 9 years ago

@stevebanik How did you get your timestamps to be in that format? I've tried doing

foo_timestamp <-as.POSIXct(parse_iso_8601(doc$fields$`@timestamp`))

But my timestamps now are nums

str(foo_dataframe)
'data.frame':   100 obs. of  2 variables:
 $ timestamp_list: num  1.44e+09 1.44e+09 1.44e+09 1.44e+09 1.44e+09 ...
 $ in_bytes_list : num  977 1965 973 986 977 ...

And when I run the anomaly detector

AnomalyDetectionVec(foo_dataframe, period=100, plot=TRUE)
Error in AnomalyDetectionVec(es_out2, period = 100, plot = TRUE) : 
  data must be a single data frame, list, or vector that holds numeric values.

My data frame looks like this:

head(foo_dataframe)
  timestamp_list in_bytes_list
1     1437617401           977
2     1437617401          1965
3     1437617401           973
4     1437617401           986
5     1437617401           977
6     1437617391           605

Any help would be greatly appreciated!

QuantScientist3 commented 8 years ago

Same here. Were you able to resolve this?