grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
3.98k stars 503 forks source link

Flaky test TestRulerEvaluationDelay #4857

Open 56quarters opened 1 year ago

56quarters commented 1 year ago

https://github.com/grafana/mimir/actions/runs/4817437976/jobs/8578292294?pr=4855

 === RUN   TestRulerEvaluationDelay
08:21:12 Starting minio-9000
08:21:13 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:33076]
08:21:13 Started minio-9000
08:21:13 Starting mimir
08:21:14 Ports for container: e2e-mimir-test-mimir Mapping: map[9009:33077]
08:21:14 Started mimir
    ruler_test.go:312: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:312
            Error:          Not equal: 
                            expected: 2
                            actual  : 4
            Test:           TestRulerEvaluationDelay
08:21:27 Killing mimir
08:21:27 Killing minio-9000
08:21:27 Killed minio-9000
08:21:27 Killed mimir
08:21:27 error while removing sharedDir /home/runner/work/mimir/mimir/integration/e2e_integration_test2407693794 err: unlinkat /home/runner/work/mimir/mimir/integration/e2e_integration_test2407693794/rule_tmp/anonymous/ns: permission denied
--- FAIL: TestRulerEvaluationDelay (15.03s)
zenador commented 10 months ago

https://github.com/grafana/mimir/actions/runs/6570005457/job/17846842247?pr=6391

=== RUN   TestRulerEvaluationDelay
05:04:05 Starting minio-9000
05:04:06 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:33051]
05:04:06 Started minio-9000
05:04:06 Starting mimir
05:04:07 Ports for container: e2e-mimir-test-mimir Mapping: map[9009:33052]
05:04:07 Started mimir
    ruler_test.go:312: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:312
            Error:          Not equal: 
                            expected: 2
                            actual  : 4
            Test:           TestRulerEvaluationDelay
05:04:20 Killing mimir
05:04:20 Killing minio-9000
05:04:20 Killed minio-9000
05:04:20 Killed mimir
05:04:20 error while removing sharedDir /home/runner/work/mimir/mimir/integration/e2e_integration_test2060902976 err: unlinkat /home/runner/work/mimir/mimir/integration/e2e_integration_test2060902976/rule_tmp/anonymous/ns: permission denied
--- FAIL: TestRulerEvaluationDelay (15.46s)
charleskorn commented 10 months ago

Another example:

=== RUN   TestRulerEvaluationDelay
09:40:41 Starting minio-9000
09:40:42 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:33050]
09:40:42 Started minio-9000
09:40:42 Starting mimir
09:40:42 Ports for container: e2e-mimir-test-mimir Mapping: map[9009:33051]
09:40:42 Started mimir
    ruler_test.go:312: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:312
            Error:          Not equal: 
                            expected: 2
                            actual  : 4
            Test:           TestRulerEvaluationDelay
09:40:55 Killing mimir
09:40:55 Killing minio-9000
09:40:55 Killed minio-9000
09:40:55 Killed mimir
09:40:55 error while removing sharedDir /home/runner/work/mimir/mimir/integration/e2e_integration_test2458340898 err: unlinkat /home/runner/work/mimir/mimir/integration/e2e_integration_test2458340898/rule_tmp/anonymous/ns: permission denied
--- FAIL: TestRulerEvaluationDelay (14.20s)
zenador commented 10 months ago

Again

    ruler_test.go:312: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:312
            Error:          Not equal: 
                            expected: 2
                            actual  : 4
            Test:           TestRulerEvaluationDelay
dimitarvdimitrov commented 9 months ago

Another instance

=== RUN   TestRulerEvaluationDelay
16:51:01 Starting minio-9000
16:51:01 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:32987]
16:51:01 Started minio-9000
16:51:01 Starting mimir
16:51:02 Ports for container: e2e-mimir-test-mimir Mapping: map[9009:32988]
16:51:02 Started mimir
16:51:03 mimir: ts=2023-12-04T16:51:03.672636524Z caller=loader.go:116 level=warn msg="bucket index not found" user=anonymous
    ruler_test.go:312: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:312
            Error:          Not equal: 
                            expected: 2
                            actual  : 4
            Test:           TestRulerEvaluationDelay
16:51:15 Killing mimir
16:51:15 Killing minio-9000
16:51:15 Killed minio-9000
16:51:15 Killed mimir
16:51:16 error while removing sharedDir /home/runner/work/mimir/mimir/integration/e2e_integration_test3902039097 err: unlinkat /home/runner/work/mimir/mimir/integration/e2e_integration_test3902039097/rule_tmp/anonymous/ns: permission denied
--- FAIL: TestRulerEvaluationDelay (14.89s)
dimitarvdimitrov commented 7 months ago

another instance

=== RUN   TestRulerEvaluationDelay
11:48:01 Starting minio-9000
11:48:01 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:32988]
11:48:01 Started minio-9000
11:48:01 Starting mimir
11:48:02 Ports for container: e2e-mimir-test-mimir Mapping: map[9009:32989]
11:48:02 Started mimir
11:48:03 mimir: ts=2024-01-17T11:48:03.672226245Z caller=loader.go:116 level=warn msg="bucket index not found" user=anonymous
    ruler_test.go:312: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:312
            Error:          Not equal: 
                            expected: 2
                            actual  : 4
            Test:           TestRulerEvaluationDelay
11:48:15 Killing mimir
11:48:15 Killing minio-9000
11:48:15 Killed minio-9000
11:48:15 Killed mimir
11:48:15 error while removing sharedDir /home/runner/work/mimir/mimir/integration/e2e_integration_test2871173288 err: unlinkat /home/runner/work/mimir/mimir/integration/e2e_integration_test2871173288/rule_tmp/anonymous/ns: permission denied
--- FAIL: TestRulerEvaluationDelay (14.51s)
dimitarvdimitrov commented 7 months ago

maybe https://github.com/grafana/mimir/pull/7217 helps with fixing this

dimitarvdimitrov commented 4 months ago

there was another failure. It looks like the first sample is missing. The rest of the samples look ok. I don't understand why that sample wouldn't be returned by the query.

The sample should have timestamp of 1713780848667 (2024-04-22 10:14:08.667) and the query should be requesting all samples after 1713780848.66672 (2024-04-22 10:14:08.66672). A rounding error can't explain why a sample was omitted because prometheus's query API's start parameter is inclusive.

=== RUN   TestRulerEvaluationDelay
10:18:51 Starting minio-9000
10:18:52 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:32990]
10:18:52 Started minio-9000
10:18:52 Starting mimir
10:18:52 Ports for container: e2e-mimir-test-mimir Mapping: map[9009:32991]
10:18:52 Started mimir
10:19:08 mimir: ts=2024-04-22T10:19:08.674126494Z caller=loader.go:120 level=warn msg="bucket index not found" user=anonymous
    ruler_test.go:296: querying from  2024-04-22 10:14:08.666719842 +0000 UTC m=+70.656273830 to 2024-04-22 10:19:08.666719842 +0000 UTC m=+370.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 2
                            actual  : 4
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 2 actual value 4 actual timestamp 1713780849.667 expected timestamp 2024-04-22 10:14:08.666719842 +0000 UTC m=+70.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 4
                            actual  : 8
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 4 actual value 8 actual timestamp 1713780851.667 expected timestamp 2024-04-22 10:14:09.666719842 +0000 UTC m=+71.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 8
                            actual  : 10
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 8 actual value 10 actual timestamp 1713780852.667 expected timestamp 2024-04-22 10:14:11.666719842 +0000 UTC m=+73.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 10
                            actual  : 12
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 10 actual value 12 actual timestamp 1713780853.667 expected timestamp 2024-04-22 10:14:12.666719842 +0000 UTC m=+74.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 12
                            actual  : 14
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 12 actual value 14 actual timestamp 1713780854.667 expected timestamp 2024-04-22 10:14:13.666719842 +0000 UTC m=+75.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 14
                            actual  : 16
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 14 actual value 16 actual timestamp [1713](https://github.com/grafana/mimir/actions/runs/8782252977/job/24096123714?pr=7931#step:15:1714)780855.667 expected timestamp 2024-04-22 10:14:14.666719842 +0000 UTC m=+76.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 16
                            actual  : 18
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 16 actual value 18 actual timestamp 1713780856.667 expected timestamp 2024-04-22 10:14:15.666719842 +0000 UTC m=+77.656273830
    ruler_test.go:314: 
            Error Trace:    /home/runner/work/mimir/mimir/integration/ruler_test.go:314
            Error:          Not equal: 
                            expected: 18
                            actual  : 20
            Test:           TestRulerEvaluationDelay
    ruler_test.go:315: expected value 18 actual value 20 actual timestamp 1713780857.667 expected timestamp 2024-04-22 10:14:16.666719842 +0000 UTC m=+78.656273830
    ruler_test.go:315: expected value 20 actual value 20 actual timestamp 1713780858.667 expected timestamp 2024-04-22 10:14:17.666719842 +0000 UTC m=+79.656273830
10:19:20 Killing mimir
10:19:20 Killing minio-9000
10:19:20 Killed minio-9000
10:19:20 Killed mimir
dimitarvdimitrov commented 2 months ago

same problem as above

Details

``` === RUN TestRulerEvaluationDelay 14:22:03 Starting minio-9000 14:22:03 Ports for container: e2e-mimir-test-minio-9000 Mapping: map[9000:33003] 14:22:03 Started minio-9000 14:22:04 Starting mimir 14:22:05 Ports for container: e2e-mimir-test-mimir Mapping: map[9009:33004] 14:22:05 Started mimir 14:22:20 mimir: ts=2024-06-24T14:22:20.673332672Z caller=loader.go:120 level=warn msg="bucket index not found" user=anonymous ruler_test.go:296: querying from 2024-06-24 14:17:20.64879261 +0000 UTC m=+146.840826920 to 2024-06-24 14:22:20.64879261 +0000 UTC m=+446.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 2 actual : 4 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 2 actual value 4 actual timestamp 1719238642.649 expected timestamp 2024-06-24 14:17:20.64879261 +0000 UTC m=+146.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 4 actual : 8 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 4 actual value 8 actual timestamp 1719238644.649 expected timestamp 2024-06-24 14:17:21.64879261 +0000 UTC m=+147.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 8 actual : 10 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 8 actual value 10 actual timestamp 1719238645.649 expected timestamp 2024-06-24 14:17:23.64879261 +0000 UTC m=+149.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 10 actual : 12 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 10 actual value 12 actual timestamp [1719](https://github.com/grafana/mimir/actions/runs/9646885310/job/26604585831?pr=8501#step:15:1720)238646.649 expected timestamp 2024-06-24 14:17:24.64879261 +0000 UTC m=+150.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 12 actual : 14 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 12 actual value 14 actual timestamp 1719238647.649 expected timestamp 2024-06-24 14:17:25.64879261 +0000 UTC m=+151.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 14 actual : 16 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 14 actual value 16 actual timestamp 1719238648.649 expected timestamp 2024-06-24 14:17:26.64879261 +0000 UTC m=+152.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 16 actual : 18 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 16 actual value 18 actual timestamp 1719238649.649 expected timestamp 2024-06-24 14:17:27.64879261 +0000 UTC m=+153.840826920 ruler_test.go:314: Error Trace: /home/runner/work/mimir/mimir/integration/ruler_test.go:314 Error: Not equal: expected: 18 actual : 20 Test: TestRulerEvaluationDelay ruler_test.go:315: expected value 18 actual value 20 actual timestamp 1719238650.649 expected timestamp 2024-06-24 14:17:28.64879261 +0000 UTC m=+154.840826920 ruler_test.go:315: expected value 20 actual value 20 actual timestamp 1719238651.649 expected timestamp 2024-06-24 14:17:29.64879261 +0000 UTC m=+155.840826920 14:22:32 Killing mimir 14:22:32 Killing minio-9000 14:22:32 Killed minio-9000 14:22:32 Killed mimir 14:22:32 error while removing sharedDir /home/runner/work/mimir/mimir/integration/e2e_integration_test1320575141 err: unlinkat /home/runner/work/mimir/mimir/integration/e2e_integration_test1320575141/rule_tmp/anonymous/ns: permission denied --- FAIL: TestRulerEvaluationDelay (29.34s) ```