influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.32k stars 492 forks source link

panic: invalid timer state #1705

Open wolfsoft opened 7 years ago

wolfsoft commented 7 years ago

Hello, I've installed kapacitor from official repo and setup telegraf/influxdb/kapacitor stack. The kapacitor fails everyday :(

OS: CentOS release 6.9 (Final) kapacitor.x86_64 1.3.3-1 @influxdb

cat /var/log/kapacitor/kapacitord.err

2017/12/01 08:33:07 Using configuration at: /etc/kapacitor/kapacitor.conf panic: invalid timer state

goroutine 98 [running]: panic(0x1bb4de0, 0xc4230d3780) /usr/local/go/src/runtime/panic.go:500 +0x1a1 github.com/influxdata/kapacitor/timer.(timer).Start(0xc42006eea0) /root/go/src/github.com/influxdata/kapacitor/timer/timer.go:68 +0xfd github.com/influxdata/kapacitor.(InfluxQLNode).runBatchInfluxQL(0xc42046cc00, 0x1c72b20, 0xc42030a2b0) /root/go/src/github.com/influxdata/kapacitor/influxql.go:176 +0x170 github.com/influxdata/kapacitor.(InfluxQLNode).runInfluxQLs(0xc42046cc00, 0x0, 0x0, 0x0, 0xc42049cf58, 0xc42049cf68) /root/go/src/github.com/influxdata/kapacitor/influxql.go:45 +0x4c github.com/influxdata/kapacitor.(InfluxQLNode).(github.com/influxdata/kapacitor.runInfluxQLs)-fm(0x0, 0x0, 0x0, 0xc42049cf80, 0x1) /root/go/src/github.com/influxdata/kapacitor/influxql.go:36 +0x48 github.com/influxdata/kapacitor.(node).start.func1(0xc42046cc00, 0x0, 0x0, 0x0) /root/go/src/github.com/influxdata/kapacitor/node.go:140 +0x8e created by github.com/influxdata/kapacitor.(node).start /root/go/src/github.com/influxdata/kapacitor/node.go:141 +0x5d

tail -n 10 /var/log/kapacitor/kapacitor.log

[httpd] 127.0.0.1 - - [01/Dec/2017:12:13:00 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" f9677af1-d690-11e7-a93b-000000000000 984 [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:01 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" fa1e4784-d690-11e7-a93c-000000000000 707 [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:01 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" fa2e1e7c-d690-11e7-a93d-000000000000 872 [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:02 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" fa527ca6-d690-11e7-a93e-000000000000 660 [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:02 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" fa52b5ff-d690-11e7-a93f-000000000000 546 [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:03 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" fafce855-d690-11e7-a940-000000000000 1122 [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:10 +0000] "POST /write?consistency=&db=_internal&precision=ns&rp=monitor HTTP/1.1" 204 0 "-" "InfluxDBClient" ff01d44d-d690-11e7-a941-000000000000 890 [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:10 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" ff2fb8f1-d690-11e7-a942-000000000000 580 [cpu_load:percentile5] 2017/12/01 12:13:10 E! failed to emit batch: edged aborted [httpd] 127.0.0.1 - - [01/Dec/2017:12:13:10 +0000] "POST /write?consistency=&db=telegraf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "InfluxDBClient" ff5d729d-d690-11e7-a943-000000000000 969

wolfsoft commented 6 years ago

The tick script responsible for this issue:

stream
    |from()
        .database('telegraf')
        .retentionPolicy('autogen')
        .measurement('cpu')
    // create a new field called 'used' which inverts the idle cpu.
    |eval(lambda: 100.0 - "usage_idle")
        .as('used')
    |groupBy('host')
    |window()
        .period(10m)
        .every(10m)
    // calculate the 95th percentile of the used cpu.
    |percentile('used', 95.0)
    |eval(lambda: sigma("percentile"))
        .as('sigma')
        .keep('percentile', 'sigma')
    |alert()
        .message('...')
        .details('...')
        // Compare values to running mean and standard deviation
        .warn(lambda: "sigma" > 2.5)
        .stateChangesOnly()
        .email()
    |alert()
        .message('...')
        .details('...')
        // Compare values to running mean and standard deviation
        .crit(lambda: "sigma" > 3.0)
        .stateChangesOnly()
        .email()
        .telegram()
nathanielc commented 6 years ago

@wolfsoft We believe this has been fixed in the current 1.4.0-rc2 release of Kapacitor. Can you confirm?

wolfsoft commented 6 years ago

@nathanielc, I don't see this version in the repository https://repos.influxdata.com/centos/. I'll check it right away as soon as it appears. Thank you, anyway!