Closed kiefersmith closed 3 years ago
Yeah, I have tried some demos on this and it works You can try to download this dataset: https://www.dropbox.com/s/fojf1ll7w128si9/CleanedDataset.RDS?dl=1
then type in the following code
Data <- readRDS(PATH_TO_DOWNLOADED_RDS_FILE_HERE)
DataForAnomalyAn <- Data %>% mutate(date = as.POSIXct(format(date, '%Y-%m-%d %H:00'))) %>% group_by(date) %>% summarise(count = n() )
Apply anomaly detection
data_anomaly <- AnomalyDetectionTs(DataForAnomalyAn[, c("date", "count")], max_anoms = 0.02, direction = 'both', plot = TRUE, e_value = T)
Plot original data + anomalies points
data_anomaly$plot data_anomaly$anoms
Instead of downloading the dataset I provided, you can generate it yourself using this code:
Installing Packages
install.packages("janitor") install.packages("hms") install.packages("RcppRoll") install.packages("devtools") devtools::install_github("exploratory-io/exploratory_func") devtools::install_github("gitronald/IPtoCountry") devtools::install_github("twitter/AnomalyDetection") install.packages("readr")
Loading Libraries
library(janitor) library(lubridate) library(hms) library(tidyr) library(stringr) library(readr) library(forcats) library(RcppRoll) library(dplyr) library(IPtoCountry) library(exploratory) library(AnomalyDetection)
Data <- exploratory::read_log_file("C:/Users/GiannisM/Documents/Visual Studio 2017/Projects/Anomaly Detection/Anomaly Detection/access.log", skip = 0, col_names = FALSE) %>% exploratory::clean_data_frame() %>% rename(ip_address = X1, date = X4, url_request = X5) %>% select(-X2, - X3, - X6, - X7, - X9, - X10, - X8) %>% mutate(country = IP_country(ip_address), date = dmy_hms(date)) %>% filter(!is.na(country)) %>% select(-ip_address) %>% mutate(session_id = url_param(url_request, "JSESSIONID")) %>% distinct(session_id, .keep_all = TRUE)
then proceed normally with the code above.
I am trying to detect anomalies on a dataset with about 300 entries (about 10 per day). Any way I slice the data I can't seem to get either of the mentioned functions to return anything. I've followed the documentation on the repo, but no luck. The output I get is the following:
Has anyone been able to use this package successfully?
Thanks in advance.