grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.17k stars 537 forks source link

Flaky TestDispatcherGroupLimits/low_limit #449

Open bboreham opened 3 years ago

bboreham commented 3 years ago

Failed: https://github.com/grafana/mimir/runs/4103684576

--- FAIL: TestDispatcherGroupLimits (4.10s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.33s)
        alertmanager_test.go:110: expected <nil>, got 
            metric output does not match expectation; want:

            # HELP alertmanager_dispatcher_aggregation_group_limit_reached_total Number of times when dispatcher failed to create new aggregation group due to limit.
            # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            alertmanager_dispatcher_aggregation_group_limit_reached_total 4

            got:

            # HELP alertmanager_dispatcher_aggregation_group_limit_reached_total Number of times when dispatcher failed to create new aggregation group due to limit.
            # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            alertmanager_dispatcher_aggregation_group_limit_reached_total 5

Passed on re-run: https://github.com/grafana/mimir/runs/4104035197

zenador commented 1 year ago

Just encountered this: https://github.com/grafana/mimir/actions/runs/6184952844/job/16789586635?pr=5925

--- FAIL: TestDispatcherGroupLimits (3.28s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.03s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

Works locally (and I didn't modify anything that would affect this), and passed on a rerun: https://github.com/grafana/mimir/actions/runs/6184952844/job/16790267619?pr=5925

colega commented 1 year ago

Still flaky:

$ go test -run=TestDispatcherGroupLimits -count=1000 ./pkg/alertmanager
--- FAIL: TestDispatcherGroupLimits (3.05s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.03s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

--- FAIL: TestDispatcherGroupLimits (3.01s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.01s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

--- FAIL: TestDispatcherGroupLimits (3.02s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.02s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

--- FAIL: TestDispatcherGroupLimits (3.02s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.01s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

--- FAIL: TestDispatcherGroupLimits (3.03s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.02s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

--- FAIL: TestDispatcherGroupLimits (3.03s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.02s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

--- FAIL: TestDispatcherGroupLimits (3.01s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.01s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5

FAIL
FAIL    github.com/grafana/mimir/pkg/alertmanager   28.314s
FAIL
charleskorn commented 1 year ago

Another example:

--- FAIL: TestDispatcherGroupLimits (3.25s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.03s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5
zenador commented 1 year ago

https://github.com/grafana/mimir/actions/runs/6735289122/job/18308218253?pr=6544

--- FAIL: TestDispatcherGroupLimits (3.29s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.03s)
        alertmanager_test.go:130: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5
colega commented 3 months ago

Still flaky.

pracucci commented 3 months ago

Another occurrence here:

--- FAIL: TestDispatcherGroupLimits (3.22s)
    --- FAIL: TestDispatcherGroupLimits/low_limit (3.03s)
        alertmanager_test.go:141: expected <nil>, got 

            Diff:
            --- metric output does not match expectation; want
            +++ got:
            @@ -2,3 +2,3 @@
             # TYPE alertmanager_dispatcher_aggregation_group_limit_reached_total counter
            -alertmanager_dispatcher_aggregation_group_limit_reached_total 4
            +alertmanager_dispatcher_aggregation_group_limit_reached_total 5