prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.57k stars 2.14k forks source link

time_interval definition is not taken to be in UTC #2646

Closed zakaf closed 3 years ago

zakaf commented 3 years ago

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

templates:

The root node of the routing tree.

route: group_by: [ 'severity', 'alertname', 'job' ] group_wait: 30s receiver: 'slack.alert' routes:

A list of notification receivers.

receivers:

A list of mute time intervals for muting routes.

mute_time_intervals:

// set mute_time_interval as 13:00~24:00 7월 12 12:08:00 host.address.com alertmanager[6123]: level=debug ts=2021-07-12T03:08:00.149Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=AboveAverageConnectionCnt[ff9598c][active] 7월 12 12:08:00 host.address.com alertmanager[6123]: level=debug ts=2021-07-12T03:08:00.149Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=AboveAverageConnectionCnt[ed30b21][active] 7월 12 12:08:00 host.address.com alertmanager[6123]: level=debug ts=2021-07-12T03:08:00.149Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=AboveAverageConnectionCnt[4639a49][active] 7월 12 12:08:00 host.address.com alertmanager[6123]: level=debug ts=2021-07-12T03:08:00.150Z caller=dispatch.go:475 component=dispatcher aggrGroup="{}/{severity=\"warning\"}:{alertname=\"AboveAverageConnectionCnt\", job=\"k8s\", severity=\"warning\"}" msg=flushing alerts="[AboveAverageConnectionCnt[ff9598c][active] AboveAverageConnectionCnt[ed30b21][active] AboveAverageConnectionCnt[4639a49][active]]" 7월 12 12:08:00 host.address.com alertmanager[6123]: level=debug ts=2021-07-12T03:08:00.915Z caller=notify.go:734 component=dispatcher receiver=slack.alert integration=slack[0] msg="Notify success" attempts=1

roidelapluie commented 3 years ago

cc @benridley can you please have a look at this report? Thanks!

benridley commented 3 years ago

Thanks @zakaf for raising and @roidelapluie for bringing this to my attention.

The mute time mechanism currently uses the aggregation group's internal timer to determine the current time and make comparisons with mute times. Turns out this timer fires in local time, not in UTC. I've raised #2648 to ensure comparisons are only made in UTC and added a few test cases to confirm.

zakaf commented 3 years ago

@roidelapluie @benridley Thanks for the bug fix :) I've tried out 0.23.0 and the #2648 has fixed the issue