influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.31k stars 492 forks source link

Different expected values with Graphite as input #1186

Open sbengo opened 7 years ago

sbengo commented 7 years ago

Hi,

I just started testing Kapacitor with [[graphite]] InfluxDB service. Afer creatng the alert, I realized that the result exposed by Kapacitor seems to be the Graphite value divided by 10!

So, some questions:

Case:

I have the following test configuration:

Kapacitor Graphite section conf

[[graphite]]
  enabled = true
  database = "monitoring"
  bind-address = ":2006"
  retention-policy = "autogen"
  protocol = "tcp"
  consistency-level = "one"
  separator = "_"

  templates = [

  #CPU
  "*.*.*.*.*.system.cpu.* system.sceneario.center.site.hostname.measurement.measurement.field",
  ]

I'm duplicating data using carbon-c-relay to the localhost:2006 port, in order to let Kapacitor listen to data:

I created an alert with the following configuration, note that is always firing just to test it, called test_cpu

Test alert to eval CPU percent-active field

stream
    // Select just the cpu measurement from our example database.
    |from()
        .measurement('system_cpu')
        .groupBy(*)
    |default()
        .field('percent-active',0.0)
    |window()
        .period(1m)
        .every(1m)
    |mean('percent-active')
    |alert()
        .warn(lambda:"mean" > 0.01)
        .crit(lambda: "mean" >  90)
        // Whenever we get an alert write it to a file.
        .id('kapacitor/{{ index .Tags "service" }}')
        .message('{{ .ID }} is {{ .Level }} value:{{ index .Fields "mean" }}')
        .log('/tmp/alerts.log')
               .details('''
<h1>{{ .ID }}</h1>
<b>{{ .Message }}</b>
Value: {{ index .Fields "value" }}
''')

Result eval:

So as I expected, it is processing percent-active and firing the warn alert correctly. The problem is that the value, seems wrong if I compare it directly with Graphite...The value on Kapacitor seems to be the value on Graphite/10!!

[cpu_alert:alert5] 2017/02/09 15:08:11 D! WARNING alert triggered id:kapacitor/ msg:kapacitor/ is WARNING value:0.5250772 data:&{system_cpu map[center:X scenario:devel hostname:hostAA system:Y site:Z] [time mean] [[2017-02-09 15:08:10 +0100 CET 0.5250772]]}

[cpu_alert:alert5] 2017/02/09 15:07:24 D! WARNING alert triggered id:kapacitor/ msg:kapacitor/ is WARNING value:0.3852104 data:&{system_cpu map[center:X, scenario:devel hostname:hostBB system:Y site:Z] [time mean] [[2017-02-09 15:07:23 +0100 CET 0.3852104]]}

image

Runing Kapacitor 1.2.0

Thanks for all, Greetings.

rossmcdonald commented 7 years ago

@sbengo Regarding your questions:

Do you have any idea why is this happening?

That's very strange. We haven't heard similar reports of this issue. It doesn't look like you're using InfluxDB, but I'd be curious if you see the same results when ingesting the equivalent data in InfluxDB. Kapacitor re-uses the InfluxDB Graphite plugin, so, if there is an issue, it will most likely be there.

There is some way to see what value is Kapacitor evaluating?

Yes, you can add log nodes to your task to have Kapacitor dump the raw data to the logs. For example:

stream
    // Select just the cpu measurement from our example database.
    |from()
        .measurement('system_cpu')
        .groupBy(*)
    |log()
    |default()
        .field('percent-active',0.0)
    |window()
    ...

Please try redefining your task with the log nodes included, and then provide the log output. If that is still showing incorrect values, then we can try tracking this down in the Graphite code in InfluxDB.