twitter / AnomalyDetection

Anomaly Detection with R
GNU General Public License v3.0
3.54k stars 778 forks source link

Issue - period length for time series decomposition #77

Open sonnylaskar opened 7 years ago

sonnylaskar commented 7 years ago

Hello team,

I started exploring this package and I am struck. I have a data.frame which contains some parameter values captured every 15 minutes , hence 96 records for one day. I have data for 27 days.

I get the below error when I try to run:

> names(a)
[1] "DTime"          "Paramter"

> unique(as.Date(a$DTime))
 [1] "2016-06-27" "2016-06-28" "2016-06-29" "2016-06-30" "2016-06-09"
 [6] "2016-06-10" "2016-06-11" "2016-06-12" "2016-06-13" "2016-06-14"
[11] "2016-06-15" "2016-06-16" "2016-06-17" "2016-06-18" "2016-06-19"
[16] "2016-06-20" "2016-06-21" "2016-06-22" "2016-06-23" "2016-06-24"
[21] "2016-06-25" "2016-06-26" "2016-07-01" "2016-07-02" "2016-07-03"
[26] "2016-07-04" "2016-06-08"

> head(a)
                DTime Paramter
1 2016-06-27 00:00:00          13.03
2 2016-06-27 00:15:00           1.58
3 2016-06-27 00:30:00           1.39
4 2016-06-27 00:45:00           1.61
5 2016-06-27 01:00:00           6.99
6 2016-06-27 01:15:00           1.71

> AnomalyDetectionTs(a,   max_anoms = 0.01)
Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period,  :
  must supply period length for time series decomposition

I tried longterm = T but didnt help. Please let me know how to solve this.

mvadu commented 7 years ago

I got into same error. The problem is in the missing sec in period calculation

 period = switch(gran,
                  min = 1440,
                  hr = 24,
                  # if the data is daily, then we need to bump the period to weekly to get multiple examples
                  day = 7)
sonnylaskar commented 7 years ago

Thanks @mvadu for your reply. So what should I do to resolve this issue? I tried both POSIXlt and POSIXct classes for the timestamp. Please assist.

Gypsying commented 5 years ago

I got into same error...

my dataset is: timestamp count 1 2018-08-10 10:00:00 36936541213 2 2018-08-10 10:05:00 36660137371 3 2018-08-10 10:10:00 35549664089 4 2018-08-10 10:15:00 34855280888 5 2018-08-10 10:20:00 35208862183 6 2018-08-10 10:25:00 34907127930 ...

Any help would be appreciated

nikhiljay commented 5 years ago

@Gypsying did you get it working?

mvadu commented 5 years ago

The library currently tries to convert the second granularity into minutes, by aggregating it but it does not set the new granularity. Few folks have tried to fix that in their own repos (see the linked PRs above). But I think the right thing to do is to not to touch the actual data set as sum of input numbers might not make sense in all cases (e.g. temperature, speed or stockprice). Right thing is to remove the logic in L164, and add switch condition for sec here with value of 86400.

On the other hand, the timeseries dataset in the OP and above by @nikhiljay do not have the point separated by seconds. So may be we need to check why its deciding to use sec as granularity.