nathanielc / morgoth

Metric anomaly detection
http://docs.morgoth.io
Apache License 2.0
280 stars 31 forks source link

No data points processed #44

Open divgwd opened 7 years ago

divgwd commented 7 years ago

Hi, I am using morgoth for anomaly detection , kapacitor(running on 9096) is on a different server and the influxdb is on a different server ,after enabling the task i notice that kapacitor is not streaming data. the related info:

kapacitor version
Kapacitor 1.2.0 (git: master 5408057e5a3493d3b5bd38d5d535ea45b587f8ff)

kapacitor.conf file:

 hostname = "localhost"
data_dir = "/var/lib/kapacitor"
skip-config-overrides = false
default-retention-policy = ""
[http]
  bind-address = ":9096"
  auth-enabled = false
  log-enabled = true
  write-tracing = false
  pprof-enabled = false
  https-enabled = false
  https-certificate = "/etc/ssl/kapacitor.pem"
[config-override]  
 enabled = true
[logging]
   file = "/var/log/kapacitor/kapacitor.log"
    level = "INFO"
[replay]
  dir = "/var/lib/kapacitor/replay"
[task]
  dir = "/var/lib/kapacitor/tasks"
 snapshot-interval = "60s"
[storage]
 boltdb = "/var/lib/kapacitor/kapacitor.db"
[deadman]
 global = false
  threshold = 0.0
  interval = "10s"
  id = "node 'NODE_NAME' in task '{{ .TaskName }}'"
  message = "{{ .ID }} is {{ if eq .Level \"OK\" }}alive{{ else }}dead{{ end }}: {{ index .Fields \"collected\" | printf \"%0.3f\" }} points/INTERVAL."
[[influxdb]]
  enabled = true
  default = true
  name = "server1"
  urls = ["http://172.16.23.20:8086"]
  username = ""
  password = ""
  timeout = 0
  insecure-skip-verify = false
  startup-timeout = "5m"
  disable-subscriptions = false
  subscription-protocol = "http"
 subscriptions-sync-interval = "1m0s"
  kapacitor-hostname = ""
  http-port = 0
  udp-bind = ""
  udp-buffer = 1000
 udp-read-buffer = 0 

The kapacitor.log

[root@localhost Morgoth_Tick_scripts]# tail -f /var/log/kapacitor/kapacitor.log
[task_master:main] 2017/03/22 17:17:20 I! Started task: cpu_idle
[cpu_idle:morgoth3] 2017/03/22 17:17:20 I!P 2017/03/22 17:17:20 I! Starting agent using STDIN/STDOUT
[httpd] ::1 - - [22/Mar/2017:17:17:20 +0530] "PATCH /kapacitor/v1/tasks/cpu_idle HTTP/1.1" 200 1094 "-" "KapacitorClient" 4e59cb20-0ef5-11e7-8052-000000000000 62470
[httpd] ::1 - - [22/Mar/2017:17:17:27 +0530] "GET /kapacitor/v1/tasks?dot-view=attributes&fields=link&limit=100&offset=0&pattern=cpu_idle&replay-id=&script-format=formatted HTTP/1.1" 200 123 "-" "KapacitorClient" 52b842c7-0ef5-11e7-8053-000000000000 1324
[httpd] ::1 - - [22/Mar/2017:17:17:27 +0530] "PATCH /kapacitor/v1/tasks/cpu_idle HTTP/1.1" 200 1103 "-" "KapacitorClient" 52b8aa97-0ef5-11e7-8054-000000000000 42493
[httpd] ::1 - - [22/Mar/2017:17:17:36 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 584398f4-0ef5-11e7-8055-000000000000 10828
[httpd] ::1 - - [22/Mar/2017:17:17:38 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 5959083b-0ef5-11e7-8056-000000000000 11890
[httpd] ::1 - - [22/Mar/2017:17:17:39 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 59bd8eac-0ef5-11e7-8057-000000000000 16112
[httpd] ::1 - - [22/Mar/2017:17:17:51 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 60cf6482-0ef5-11e7-8059-000000000000 11066
[httpd] ::1 - - [22/Mar/2017:17:54:09 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 732ca24d-0efa-11e7-805a-000000000000 11702
[httpd] ::1 - - [22/Mar/2017:17:57:57 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" fb09e4b9-0efa-11e7-805b-000000000000 10574
[httpd] ::1 - - [22/Mar/2017:17:59:28 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 313fd8ae-0efb-11e7-805c-000000000000 11217

task output:

kapacitor -url http://localhost:9096 show cpu_idle
ID: cpu_idle
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 22 Mar 17 16:14 IST
Modified: 22 Mar 17 17:17 IST
LastEnabled: 22 Mar 17 17:17 IST
Databases Retention Policies: ["morgoth"."autogen"]
TICKscript:
// The measurement to analyze
var measurement = 'system'

// Optional group by dimensions
var groups = [*]

// Optional where filter
var whereFilter = lambda: "METRIC" == 'system.cpu.idle'

// The amount of data to window at once
var window = 10m

// The field to process
var field = 'VALUE'

// The name for the anomaly score field
var scoreField = 'anomalyScore'

// The minimum support
var minSupport = 0.05

// The error tolerance
var errorTolerance = 0.01

// var errorTolerance = 0.005

// The consensus
var consensus = 0.5

// Number of sigmas allowed for normal window deviation
var sigmas = 3.3

stream
    // Select the data we want
    |from()
        .database('morgoth')
        .measurement(measurement)
        .groupBy(groups)
        .where(whereFilter)
    // Window the data for a certain amount of time
    |window()
        .period(window)
        .every(window)
        .align()
    // Send each window to Morgoth
    @morgoth()
        .field(field)
        .scoreField(scoreField)
        .minSupport(minSupport)
        .errorTolerance(errorTolerance)
        .consensus(consensus)
        // Configure a single Sigma fingerprinter
        .sigma(sigmas)
    // Morgoth returns any anomalous windows
    |influxDBOut()
        .database('morgoth')
        .retentionPolicy('autogen')
        .measurement('cpu_idle_anomaly')

DOT:
digraph cpu_idle {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" ];
stream0 -> from1 [processed="0"];

from1 [avg_exec_time_ns="0s" ];
from1 -> window2 [processed="0"];

window2 [avg_exec_time_ns="0s" ];
window2 -> morgoth3 [processed="0"];

morgoth3 [avg_exec_time_ns="0s" ];
morgoth3 -> influxdb_out4 [processed="0"];

influxdb_out4 [avg_exec_time_ns="0s" points_written="0" write_errors="0" ];
}

kapacitor stats

ClusterID:                    59ac060a-2a62-4acd-9308-4af668fc42d2
ServerID:                     e3d490e8-3035-4823-884c-ec973bf81e8b
Host:                         localhost
Tasks:                        15
Enabled Tasks:                15
Subscriptions:                3
Version:                      1.2.0
 kapacitor -url http://localhost:9096 stats ingress
Database   Retention Policy Measurement Points Received
_kapacitor autogen          edges                 59505
_kapacitor autogen          ingress                4642
_kapacitor autogen          kapacitor               970
_kapacitor autogen          nodes                 57565
_kapacitor autogen          runtime                 970 

influxdb subscriptions:

Connected to http://localhost:8086 version 1.2.0
InfluxDB shell version: 1.2.0
 show subscriptions
name: _internal
retention_policy name                                           mode destinations

monitor          kapacitor-80f85136-b89b-4a20-ab98-7b1476707e38 ANY  [http://localhost:9092]
monitor          kapacitor-59ac060a-2a62-4acd-9308-4af668fc42d2 ANY  [http://localhost:9096]

name: load_testing
retention_policy name                                           mode destinations

autogen          kapacitor-80f85136-b89b-4a20-ab98-7b1476707e38 ANY  [http://localhost:9092]
autogen          kapacitor-59ac060a-2a62-4acd-9308-4af668fc42d2 ANY  [http://localhost:9096]

name: morgoth
retention_policy name                                           mode destinations

autogen          kapacitor-80f85136-b89b-4a20-ab98-7b1476707e38 ANY  [http://localhost:9092]
autogen          kapacitor-59ac060a-2a62-4acd-9308-4af668fc42d2 ANY  [http://localhost:9096]

command for defining and enabling task

kapacitor -url http://localhost:9096 define  cpu_idle -type stream  -dbrp morgoth.autogen -tick ./cpu_morgoth.tick
kapacitor -url http://localhost:9096 enable cpu_idle

influxdb data

select * from system where METRIC='system.cpu.idle' limit 10;
name: system
time                CUSTOMER DETAIL1 DETAIL2 GROUP_NAME IP_ADDRESS   LOCATION METRIC          NODE_NAME PRODUCT_NAME VALUE      VENDOR_NAME

1489567587790458512 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6139237 Teledna
1489567647790523682 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6620568 Teledna
1489567707790599926 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6098692 Teledna
1489567767790326233 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.674531  Teledna
1489567827790470782 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6243747 Teledna
1489567887790607264 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6723692 Teledna
1489567947790470804 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.5994077 Teledna
1489568007790460163 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6786397 Teledna
1489568067790455453 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6035798 Teledna
1489568127790466002 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.584484  Teledna
itsmesuniljacob commented 7 years ago

@divgwd May i know how did you define your database. am trying to get the anomaly data for my system?

divgwd commented 7 years ago

@xylene1980 excepting the "VALUE" column everything else is a Tag .

itsmesuniljacob commented 7 years ago

Thanks @divgwd. So you created a measurement called 'cpu_idle_anomaly' in morgoth database, i believe. It would be great if you could tell me the process of new database and measurement creation since am new to Influx

divgwd commented 7 years ago

@xylene1980 yes, It stores only those points that are considered anomalous by morgoth,we then use the points to plot a grafana TS graph as shown in the screen shot . we use raw data and the anomalous points to plot this graph.

morgoth

itsmesuniljacob commented 7 years ago

Thanks @divgwd

itsmesuniljacob commented 7 years ago

@divgwd Could you please share the select statement output of the 'cpu_idle_anomaly' measurement

divgwd commented 7 years ago

grafana query That is a snap shot grafana query which is used as the front end visualisation tool

itsmesuniljacob commented 7 years ago

@divgwd Sorry for my trivial questions Are the fields tags & fields in

morgoth.system

and the fields in

morgoth.cpu_idle_anomaly

same?

Can you show me the tag keys and field keys of both the measurements?

divgwd commented 7 years ago

yes ,it's the same. excepting the "VALUE" column everything else is a Tag . https://docs.influxdata.com/kapacitor/v1.3//nodes/alert_node/

itsmesuniljacob commented 7 years ago

@divgwd Thanks. I got it now. But one more question , how morgoth.system gets its data from?

divgwd commented 7 years ago

Have you worked with kapacitor's tick script before?if not then https://github.com/influxdata/kapacitor/ to get a better idea

itsmesuniljacob commented 7 years ago

@divgwd am new to Kapacitor and learning .... i believe you have a continuous query like TICK script. My doubt is that do you measurement 'system' is an user created table , correct me if i am wrong? Also i just wanted to know , how the measurement 'system' is populated with values... Is it through some job you are inserting data?

itsmesuniljacob commented 7 years ago

@divgwd i figured out thanks

jaybor68 commented 6 years ago

@xylene1980, I have a similar problem. What was your solution?