Closed m4ce closed 3 years ago
It seems that when the current level is WARNING and the warnReset condition is not met, it ignores the warn condition (even if it's not met) and actually triggers a new warning? If I remove the warnReset or add stateChangesOnly, it stops from sending a new warning when the condition is not met, which is correct. Though, I have no way of defining the warnReset then. Seems like a bug? I would expect the following behaviour:
Current level: OK warn condition is met -> send warning warnReset condition is NOT met -> do nothing
Current level: WARNING warn condition is NOT met -> do nothing warnReset condition is NOT met -> do nothing (-> this instead seems to trigger a new warning!)
Mind that I don't want to use stateChangesOnly. I want to be able to send a warning if a value is INCREASING between intervals and only reset when it's been stable for the past 5m. If the value does not increase, no warning should be sent out. However, kapacitor still sends it.
@nathanielc - would you be able to confirm the above?
@m4ce, this is working as intended.
the warn condition is quite clear. It should only send a warning if more than one restart has occurred in the last minute. Recover after 5m of no restarts.
Seems like the alert is generated even when restarts_last_min == 0.
It is not the way it works. Warn creates a condition that puts it in the .warn
state and it will continue sending warnings so long as it is in that state. .warnReset
clears the state. if you do not specify a .warnReset
or a different state change it will stop sending warnings when the .warn
lambda is no longer true.
Mind that I don't want to use stateChangesOnly. I want to be able to send a warning if a value is INCREASING between intervals and only reset when it's been stable for the past 5m. If the value does not increase, no warning should be sent out. However, kapacitor still sends it.
I suggest using .nonNegativeDerivative to find if it is increasing and make it alert on that.
the warn condition is quite clear. It should only send a warning if more than one restart has occurred in the last minute. Recover after 5m of no restarts.
Seems like the alert is generated even when
restarts_last_min
== 0.{"id":"Infra-Kubernetes-ContainerRestarting1","message":"*[ec1|monitoring|testing-exit-code-139-5b998b8484-mvk54|testing-exit-code-139]* - Container has restarted 0 time(s) in the last minute","details":"{\u0026#34;Name\u0026#34;:\u0026#34;kubernetes_pod_container_status\u0026#34;,\u0026#34;TaskName\u0026#34;:\u0026#34;Infra-Kubernetes-ContainerRestarting1\u0026#34;,\u0026#34;Group\u0026#34;:\u0026#34;cluster_name=ec1,container_name=testing-exit-code-139,namespace=monitoring,pod_name=testing-exit-code-139-5b998b8484-mvk54\u0026#34;,\u0026#34;Tags\u0026#34;:{\u0026#34;cluster_name\u0026#34;:\u0026#34;ec1\u0026#34;,\u0026#34;container_name\u0026#34;:\u0026#34;testing-exit-code-139\u0026#34;,\u0026#34;namespace\u0026#34;:\u0026#34;monitoring\u0026#34;,\u0026#34;pod_name\u0026#34;:\u0026#34;testing-exit-code-139-5b998b8484-mvk54\u0026#34;},\u0026#34;ServerInfo\u0026#34;:{\u0026#34;Hostname\u0026#34;:\u0026#34;kapacitor\u0026#34;,\u0026#34;ClusterID\u0026#34;:\u0026#34;78c68c2a-9227-4cef-be5b-90eb514b4749\u0026#34;,\u0026#34;ServerID\u0026#34;:\u0026#34;26537c4f-1381-436e-b681-89e596ada339\u0026#34;},\u0026#34;ID\u0026#34;:\u0026#34;Infra-Kubernetes-ContainerRestarting1\u0026#34;,\u0026#34;Fields\u0026#34;:{\u0026#34;restarts_last_min\u0026#34;:0,\u0026#34;warnResetCount\u0026#34;:2},\u0026#34;Level\u0026#34;:\u0026#34;WARNING\u0026#34;,\u0026#34;Time\u0026#34;:\u0026#34;2020-11-13T12:59:40Z\u0026#34;,\u0026#34;Duration\u0026#34;:799000000000,\u0026#34;Message\u0026#34;:\u0026#34;*[ec1|monitoring|testing-exit-code-139-5b998b8484-mvk54|testing-exit-code-139]* - Container has restarted 0 time(s) in the last minute\u0026#34;}\n","time":"2020-11-13T12:59:40Z","duration":799000000000,"level":"WARNING","data":{"series":[{"name":"kubernetes_pod_container_status","tags":{"cluster_name":"ec1","container_name":"testing-exit-code-139","namespace":"monitoring","pod_name":"testing-exit-code-139-5b998b8484-mvk54"},"columns":["time","restarts_last_min","warnResetCount"],"values":[["2020-11-13T12:59:40Z",0,2]]}]},"previousLevel":"WARNING","recoverable":true}
Check the columns - restarts_last_min = 0 but the level is WARNING.
Not sure how this is possible?