twitter / AnomalyDetection

Anomaly Detection with R
GNU General Public License v3.0
3.56k stars 779 forks source link

Can't get AnomalyDetectionVec or AnomalyDetectionTs to work. #90

Closed kiefersmith closed 3 years ago

kiefersmith commented 6 years ago

I am trying to detect anomalies on a dataset with about 300 entries (about 10 per day). Any way I slice the data I can't seem to get either of the mentioned functions to return anything. I've followed the documentation on the repo, but no luck. The output I get is the following:

$anoms
data frame with 0 columns and 0 rows

$plot
NULL

Has anyone been able to use this package successfully?

Thanks in advance.

N1h1l1sT commented 6 years ago

Yeah, I have tried some demos on this and it works You can try to download this dataset: https://www.dropbox.com/s/fojf1ll7w128si9/CleanedDataset.RDS?dl=1

then type in the following code

Data <- readRDS(PATH_TO_DOWNLOADED_RDS_FILE_HERE)

DataForAnomalyAn <- Data %>% mutate(date = as.POSIXct(format(date, '%Y-%m-%d %H:00'))) %>% group_by(date) %>% summarise(count = n() )

Apply anomaly detection

data_anomaly <- AnomalyDetectionTs(DataForAnomalyAn[, c("date", "count")], max_anoms = 0.02, direction = 'both', plot = TRUE, e_value = T)

Plot original data + anomalies points

data_anomaly$plot data_anomaly$anoms

Instead of downloading the dataset I provided, you can generate it yourself using this code:

Installing Packages

install.packages("janitor") install.packages("hms") install.packages("RcppRoll") install.packages("devtools") devtools::install_github("exploratory-io/exploratory_func") devtools::install_github("gitronald/IPtoCountry") devtools::install_github("twitter/AnomalyDetection") install.packages("readr")

Loading Libraries

library(janitor) library(lubridate) library(hms) library(tidyr) library(stringr) library(readr) library(forcats) library(RcppRoll) library(dplyr) library(IPtoCountry) library(exploratory) library(AnomalyDetection)

Data <- exploratory::read_log_file("C:/Users/GiannisM/Documents/Visual Studio 2017/Projects/Anomaly Detection/Anomaly Detection/access.log", skip = 0, col_names = FALSE) %>% exploratory::clean_data_frame() %>% rename(ip_address = X1, date = X4, url_request = X5) %>% select(-X2, - X3, - X6, - X7, - X9, - X10, - X8) %>% mutate(country = IP_country(ip_address), date = dmy_hms(date)) %>% filter(!is.na(country)) %>% select(-ip_address) %>% mutate(session_id = url_param(url_request, "JSESSIONID")) %>% distinct(session_id, .keep_all = TRUE)

then proceed normally with the code above.