jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.15k stars 2.4k forks source link

[Bug]: Flaky test TestPasswordFromFile #4809

Closed albertteoh closed 11 months ago

albertteoh commented 11 months ago

Duplicate of #4743

What happened?

A CI run from https://github.com/jaegertracing/jaeger/pull/4802 had a failing unit test job with error:

            Error Trace:    /home/runner/work/jaeger/jaeger/plugin/storage/es/factory_test.go:338
                                        /home/runner/work/jaeger/jaeger/plugin/storage/es/factory_test.go:268
            Error:          Not equal: 
                            expected: ":second password"
                            actual  : ":first password"

Steps to reproduce

Hard to reproduce, even with:

$ go test github.com/jaegertracing/jaeger/plugin/storage/es -run TestPasswordFromFile -count 1000

Expected behavior

Test should pass.

Relevant log output

Perhaps the most relevant logs from that test failure are (note that these are from consecutive log lines):

    factory_test.go:287: request to fake ES server: &{HEAD / HTTP/1.1 1 1 map[Authorization:[Basic OnNlY29uZCBwYXNzd29yZA==] User-Agent:[Go-http-client/1.1]] {} <nil> 0 [] false 127.0.0.1:35573 map[] map[] <nil> map[] 127.0.0.1:39250 / <nil> <nil> <nil> 0xc00010fb80}
    factory_test.go:287: request to fake ES server: &{POST /_bulk HTTP/1.1 1 1 map[Accept:[application/json] Accept-Encoding:[gzip] Authorization:[Basic OmZpcnN0IHBhc3N3b3Jk] Content-Length:[289] Content-Type:[application/x-ndjson] User-Agent:[elastic/6.2.37 (linux-amd64)]] 0xc0002e22c0 <nil> 289 [] false 127.0.0.1:35573 map[] map[] <nil> map[] 127.0.0.1:39236 /_bulk <nil> <nil> <nil> 0xc0000d0280}

Screenshot

No response

Additional context

From the two log lines, we see two different passwords provided in the Authorization header:

$ echo "OmZpcnN0IHBhc3N3b3Jk" | base64 --decode                               
:first password
$ echo "OnNlY29uZCBwYXNzd29yZA==" | base64 --decode
:second password

From the log output, it looks like the old client hasn't been properly closed and so is making requests alongside the new client, and so there's a race condition on the last write to authReceived. In the failing test case, it looks like the old client makes the last write.

Jaeger backend version

No response

SDK

No response

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

yurishkuro commented 11 months ago

I thought we already have open ticket for this test

albertteoh commented 11 months ago

I thought we already have open ticket for this test

I could only find this closed one: https://github.com/jaegertracing/jaeger/issues/4743.

albertteoh commented 11 months ago

Documenting another flaky test failure data point in the main branch: https://github.com/jaegertracing/jaeger/actions/runs/6435789834/job/17477774812#step:7:4569

yurishkuro commented 11 months ago

I don't think that one should've been closed, I'll reopen. Let's keep discussion on the original ticket as it has more context already.

Closing this as duplicate of #4743