open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.46k stars 1.47k forks source link

Flaky test: `TestServiceTelemetryReusable` #6449

Open codeboten opened 2 years ago

codeboten commented 2 years ago

See the failure: https://github.com/open-telemetry/opentelemetry-collector/actions/runs/3362867290/jobs/5575186303#step:6:4285

 2022-10-31T16:16:23.033Z   info    service/service.go:106  Everything is ready. Begin running and processing data.
    service_test.go:179: 
            Error Trace:    /home/runner/work/opentelemetry-collector/opentelemetry-collector/service/service_test.go:179
            Error:          Received unexpected error:
                            Get "http://localhost:8888/metrics": dial tcp [::1]:8888: connect: connection refused
            Test:           TestServiceTelemetryReusable
--- FAIL: TestServiceTelemetryReusable (0.01s)
sakshi1215 commented 2 years ago

@codeboten should we add a retry mechanism during the connection? I think the server might not be ready which might have caused this, lmk your opinions.

codeboten commented 2 years ago

@sakshi1215 it's possible a retry would solve this, would be good to confirm that's the root cause first. Another thing that may be causing issues is if the port is being re-used by multiple tests.

paivagustavo commented 2 years ago

Just hit this in https://github.com/open-telemetry/opentelemetry-collector/pull/6423 as well. I could recreate it by running this test multiple times with -test.count=100. And it does seems that the server was not ready yet (it gets started in a go routine, so very possible), we could possibly use assert.Eventually here and check if we can still recreate this failure.

Also agree that we should avoid using a static port here and instead getting an available one.