influxdata / chronograf

Open source monitoring and visualization UI for the TICK stack
https://www.influxdata.com/time-series-platform/chronograf/
Other
1.51k stars 258 forks source link

Updating a "deadman" alert rule makes certain variables not work and other things #1635

Closed luca-moser closed 6 years ago

luca-moser commented 7 years ago

Chronograf version: 1.3.2.1-alpine docker image

I've updated an alert rule, which checks whether the usage_percent of the cpu of docker containers couldn't be measured for over a minute. (to check whether a container went offline).

Updating said rule again makes the {{ index .Tags "container_name" }} variable inside the Telegram message blank. It also doesn't retrigger correctly for the amount of containers which went offline. Tested by stopping a docker compose service with 3 containers, but I only received one message.

problems:

Message:

{{.Level}} - {{.Time}}:
no measurement data received from container "{{ index .Tags "container_name" }}" during 1 minute. did the container crash?

The query (when you can actually see it by rebuilding it):

SELECT "usage_percent" FROM "telegraf"."autogen"."docker_container_cpu" 
WHERE time > now() - 15m AND ("container_name"='dmsbeta_server_1' OR "container_name"='dmsbeta_session_store_1' OR "container_name"='dmsbeta_storage_1') 
GROUP BY "container_name"

image

lukevmorris commented 7 years ago

Let's retest this after we cut 1.3.5

jaredscheib commented 7 years ago

Hi @luca-moser, would you mind telling us if this is still an issue you are experiencing, after updating to Chronograf version 1.3.5.0? Some changed were pushed to Kapacitor rules in Chronograf in 1.3.5.0 just a few days ago that may have affected this issue.

In the meantime, a few questions:

  1. have you tried creating this directly through Kapacitor? and if so, do you get the behavior you expected there?
  2. are all of your containers still listed in the Query Builder the next time you load it (when your query is not appearing)? I ask because I wonder if somehow the Chronograf server is becoming unaware of the containers (dmsbeta_storage_1, dmsbeta_session_store_1, dmsbeta_server_1), and that this may be why the query is not building, and also why the value is blank in the message. Could you use the influx CLI to verify that these Tags still exist?
  3. could you also try running a query against those three values for container_name tags, to using copy/paste, to ensure that those are the correct values and they work via influx directly?
russorat commented 6 years ago

closing due to lack of activity