nathanielc / morgoth

Metric anomaly detection
http://docs.morgoth.io
Apache License 2.0
280 stars 31 forks source link

Showing summary of the anomaly in alert message #67

Closed spesalvi closed 5 years ago

spesalvi commented 5 years ago

Hello,

I have created a tick script to identify anomaly in the number of requests hitting our servers. The script works fine.

The input to the morgoth is a the number of requests grouped by one minute for last one hour.

In the alert message, I would also like to add the number of requests server received in this window. This is something that I have trouble finding how to do it.


  batch
        | query('''
           SELECT count(timetaken) as count
           FROM "telegraf".two_months.responsetimes
           WHERE "responsecode" != '10018' AND "responsecode" != '10097' AND "responsecode" != '10181' AND "responsecode" != '10256' AND "responsecode" != '10285' AND "merchant" != 'Sevasys'
        ''')
     .period(window)
     .groupBy(time(1m),  'qcinstance', 'txntype', 'merchant')
     .every(1m)
     .align()
     .fill(0)
    @morgoth()
     .field(field)
     .scoreField(scoreField)
     .anomalousField('anomalous')
     .minSupport(minSupport)
     .errorTolerance(errorTolerance)
     .sigma(sigmas)
  // Morgoth returns any anomalous windows

  | eval(lambda: strReplace("txntype", ' ','%20', -1), lambda: strReplace("merchant", ' ', '%20', -1), lambda: int(unixNano(now())/1000000), lambda: int((unixNano(now())-two_hours)/1000000))
        .as('txntype2', 'merchant2', 'now', 'two_hours_ago')
        .keep('anomalous', 'txntype2', 'txntype', 'merchant2', 'merchant', 'now', 'two_hours_ago', 'count')
  |alert()
     .message(message)
     .details('')
     .crit(lambda: "anomalous")
     .slack()
     .channel('#softwarealerts')

In the alert message, I want to add the sum of the count .

I have tried doing following steps


   @morgoth()
     .field(field)
     .scoreField(scoreField)
     .anomalousField('anomalous')
     .minSupport(minSupport)
     .errorTolerance(errorTolerance)
     .sigma(sigmas)
  |sum('count')
  // Morgoth returns any anomalous windows

But, after doing above step, anomalous field is missing from the output of sum.

  batch
        | query('''
           SELECT count(timetaken) as sum
           FROM "telegraf".two_months.responsetimes
           WHERE "responsecode" != '10018' AND "responsecode" != '10097' AND "responsecode" != '10181' AND "responsecode" != '10256' AND "responsecode" != '10285' AND "merchant" != 'Sevasys'
        ''')
     .period(window)
     .groupBy( 'qcinstance', 'txntype', 'merchant')
     .every(1m)
     .align()
     .fill(0)

However, join of the result of above query is not joining with original series. So, sum value is not available at the time of alerting.

Can you please suggest a way to do this?