influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.31k stars 493 forks source link

Kapacitor fails to start when subscription already exists #1479

Open kshcherban opened 7 years ago

kshcherban commented 7 years ago

Relates to issue #679. I have following configuration: all services run in AWS, 2 influxdb servers, kapacitor runs in a container ECS and had following env variables: KAPACITOR_INFLUXDB_0_URLS_0=http://influx-server1:8086, KAPACITOR_INFLUXDB_0_URLS_1=http://influx-server2:8086. When one of the servers dies kapacitor stops sending alerts. Here's the error:

Jul 17 06:40:39 ip-172-16-71-104 docker/kapacitor[499]: [system.nodata:query1] 2017/07/17 04:40:39 E! Post http://influx-server1:8086/query?db=&q=SELECT+mean%28usage_user%29+AS+stat+FROM+telegraf.autogen.cpu+WHERE+host+%21~+%2F%28-jenkins-%7Cecs-apps%29%2F+AND+time+%3E%3D+%272017-07-17T04%3A38%3A39.572177535Z%27+AND+time+%3C+%272017-07-17T04%3A40%3A39.572177535Z%27+GROUP+BY+time%281m%2C+0s%29%2C+host: dial tcp 172.16.71.160:8086: getsockopt: connection refused

Probably it connects to the first server only and ignores the second. Not sure how container reads configuration cause i didn't find any documentation on environment variables interpretation.

So i switched influxdb to be behind ALB and configured KAPACITOR_INFLUXDB_0_URLS_0 to ALB dns name, but kapacitor refused to start.

Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: 2017/07/17 09:13:45 Using configuration at: /etc/kapacitor/kapacitor.conf
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [run] 2017/07/17 09:13:45 I! Kapacitor starting, version 1.3.1, branch master, commit 3b5512f7276483326577907803167e4bb213c613
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [run] 2017/07/17 09:13:45 I! Go version go1.7.5
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [srv] 2017/07/17 09:13:45 I! Kapacitor hostname: kapacitor.example.com
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [srv] 2017/07/17 09:13:45 I! ClusterID: 77f9dd12-a1e4-482e-90ea-7e57619e7091 ServerID: 06b5e43b-2d14-4752-ac7e-cde685de57e5
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [task_master:main] 2017/07/17 09:13:45 I! opened
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [httpd] 2017/07/17 09:13:45 I! Closed HTTP service
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [httpd] 2017/07/17 09:13:45 I! Closed HTTP service
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [task_master:main] 2017/07/17 09:13:45 I! closed
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: [run] 2017/07/17 09:13:45 E! open server: open service *influxdb.Service: failed to link subscription on startup: creating sub kapacitor-77f9dd12-a1e4-482e-90ea-7e57619e7091 for db "telegraf" and rp "autogen": subscription already exists
Jul 17 11:13:45 ip-172-16-71-104 docker/kapacitor[499]: run: open server: open service *influxdb.Service: failed to link subscription on startup: creating sub kapacitor-77f9dd12-a1e4-482e-90ea-7e57619e7091 for db "telegraf" and rp "autogen": subscription already exists

Please advise if that's an issue with my configuration or kapacitor can't work with load balanced influxdb or that's kapacitor's bug.

cmattoon commented 6 years ago

Getting the same error with a TICK stack in Kubernetes (two InfluxDB + Relay pairs). Everything was working fine until I restarted the Kapacitor Pod.

ts=2018-05-24T14:05:53.535Z lvl=error msg="encountered error" service=run err="open server: open service *influxdb.Service: failed to link subscription on startup: creating sub kapacitor-188228cb-aa37-425d-9042-8ea82a10d816 for db \"_internal\" and rp \"monitor\": subscription already exists"
run: open server: open service *influxdb.Service: failed to link subscription on startup: creating sub kapacitor-188228cb-aa37-425d-9042-8ea82a10d816 for db "_internal" and rp "monitor": subscription already exists