nathanielc / morgoth

Metric anomaly detection
http://docs.morgoth.io
Apache License 2.0
280 stars 31 forks source link

Morgoth closing a closed channel[GoLang] #53

Open buro1983 opened 7 years ago

buro1983 commented 7 years ago

I was trying CPU monitoring example but Morgoth stops working suddenly. I am using latest InfluxDB and latest Morgoth binary which I have compiled using Go1.8.3. I have got only below logs from kapacitor.log file. Any one has any idea why Morgoth is closing a closed channel?

2017/08/24 09:52:00 I!P 2017/08/24 09:52:00 I! Stopping
2017/08/24 09:52:00 I!P panic: close of closed channel
2017/08/24 09:52:00 I!P
2017/08/24 09:52:00 I!P goroutine 1 [running]:
2017/08/24 09:52:00 I!P main.(*Handler).Stop(0xc42008ca50)
2017/08/24 09:52:00 I!P    /prog/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:498 +0x33
2017/08/24 09:52:00 I!P main.main()
2017/08/24 09:52:00 I!P    /prog/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:149 +0x728
buro1983 commented 7 years ago

This is my tickscript. One more observation. I am asking Morgoth to run in every 1min interval. But when I looked into debug log, it is different.

If logging stats at 11:01:00, then
next log comea at 11:02:00, which is fine, but next logs comes at 12:02:00 which I don't understand. Does that means algo is not doing anything for 1 hr?

// The measurement to analyze
var measurement = 'cpu'

// Optional group by dimensions
var groups = []

// Optional where filter
var whereFilter = lambda: TRUE

// The amount of data to window at once
var window = 1h
var interval = 1m

// The field to process
var field = 'usage_idle'

// The name for the anomaly score field  
var scoreField = 'anomalyScore'

// The minimum support
var minSupport = 0.95

// The error tolerance
var errorTolerance = 0.01

// The consensus
var consensus = 0.5

// Number of sigmas allowed for normal window deviation
var sigmas = 3.5

var details = 'This mail is getting generated when %change of {{.ID}} in last 1hour'
var message = '[{{ .Level }}] Terago {{.ID}} , Difference: {{ index .Fields "value" }} , {{.Time}}'

var idTag = 'alertID'

var levelTag = 'level'

var messageField = 'message'

var durationField = 'duration'

var rp = 'autogen'
var db = 'telegraf'
var name = 'cpu alert'
var triggerType = 'relative'
var idVar = name + ':{{.Group}}'
var outputDB = 'alertsDB'
var outputMeasurement = 'cpu_1h'

var trigger = stream
   // Select the data we want
  |from()
      .database(db)
      .retentionPolicy(rp)
      .measurement(measurement)
     .groupBy(groups)
     .where(whereFilter)
  // Window the data for a certain amount of time
  |window()
     .period(window)
    .every(interval)
    .align()
  // Send each window to Morgoth
  @morgoth()
     .field(field)
    .scoreField(scoreField)
    .minSupport(minSupport)
    .errorTolerance(errorTolerance)
    .consensus(consensus)
   // Configure a single Sigma fingerprinter
   .sigma(sigmas)
 // Morgoth returns any anomalous windows
  |alert()
     .stateChangesOnly()
     .details(details)
    .message(message)
    .id(idVar)
    .idTag(idTag)
    .levelTag(levelTag)
    .messageField(messageField)
    .durationField(durationField)
    .details(details)
    .crit(lambda: TRUE)
    .log('/tmp/cpu_1h.log')  
trigger
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(rp)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

trigger
    |httpOut('output')
negbie commented 6 years ago

Same here. Todays latest tick stack.

alextodicescu commented 6 years ago

So, anybody found the answer to this? I initially had this problem, fixed it at some point and now have the same issue again (but forgot what I did to fix it haha)