thanos-io / promql-engine

Multi-threaded PromQL engine implementation based on the Volcano paper.
Apache License 2.0
133 stars 52 forks source link

Fuzz: Distributed query test case failure #380

Open yeya24 opened 6 months ago

yeya24 commented 6 months ago

https://github.com/thanos-io/promql-engine/actions/runs/7394293573/job/20115438270

--- FAIL: FuzzDistributedEnginePromQLSmithRangeQuery (0.70s)
    --- FAIL: FuzzDistributedEnginePromQLSmithRangeQuery/seed#0 (0.70s)
        enginefuzz_test.go:443: load 30s
                        http_requests_total{pod="nginx-1", route="/"} 1.00+1.00x4
                        http_requests_total{pod="nginx-2", route="/"}  1+2.00x4
        enginefuzz_test.go:444: ceil(
              -(
                  {__name__="http_requests_total"} @ 77.320 offset 1m10s
                -
                  {__name__="http_requests_total"} offset -2m48s
              )
            )
        enginefuzz_test.go:446: case 24 error mismatch.
            new result: {pod="nginx-1", route="/"} =>
            -6 @[0]
            -6 @[30000]
            -6 @[60000]
            -6 @[90000]
            -6 @[[120](https://github.com/thanos-io/promql-engine/actions/runs/7394293573/job/20115438270#step:4:121)000]
            {pod="nginx-2", route="/"} =>
            -10 @[0]
            -10 @[30000]
            -10 @[60000]
            -10 @[90000]
            -10 @[120000]
            old result: {pod="nginx-1", route="/"} =>
            4 @[0]
            4 @[30000]
            4 @[60000]
            4 @[90000]
            4 @[120000]
            {pod="nginx-2", route="/"} =>
            8 @[0]
            8 @[30000]
            8 @[60000]
            8 @[90000]
            8 @[120000]
        enginefuzz_test.go:451: failed 1 test cases

Not sure if it is really a bug but the mismatched results seems concerning

MichaHoffmann commented 6 months ago

Distributed Engine should be fuzzed without offset or @ i think

yeya24 commented 6 months ago

@MichaHoffmann Is @ and offset not supported for distributed query engine?

If that's the case then thanos engine should return error and fallback to the old engine then

MichaHoffmann commented 6 months ago

They are but iirc they are truncated ( here https://github.com/thanos-io/promql-engine/blob/662ae7d6e27dac1c18ab4a7deb19add136f51367/engine/distributed.go#L73 )

yeya24 commented 6 months ago

It looks like this shouldn't affect @? As @ is at an absolute timestamp. At least in this example, Idk if it is the cause of this failed test case.

MichaHoffmann commented 6 months ago

The engines are created with "zone" external labels but the actual series in them do not have the external label, i think that might be the issue.