open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.95k stars 2.29k forks source link

Flaky test: TestTracingGoldenData/otlp-opencensus port already in use #27295

Open songy23 opened 12 months ago

songy23 commented 12 months ago

Component(s)

testbed

What happened?

See https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6381486879/job/17318149920?pr=27291:

panic: cannot start pipelines: listen tcp 127.0.0.1:44455: bind: address already in use

goroutine 229 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*inProcessCollector).Start.func1()
    /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/testbed/in_process_collector.go:89 +0x[99](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6381486879/job/17318149920?pr=27291#step:7:100)
created by github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed.(*inProcessCollector).Start
    /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/testbed/in_process_collector.go:85 +0x59e
exit status 2
FAIL    github.com/open-telemetry/opentelemetry-collector-contrib/testbed/correctnesstests/traces   3.165s
cat: results/TESTRESULTS.md: No such file or directory
make: Leaving directory '/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed'
make: *** [Makefile:37: run-correctness-traces-tests] Error 1
Error: Process completed with exit code 2.

Collector version

mainline

Environment information

No response

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "127.0.0.1:44455"
exporters:
  opencensus:
    endpoint: "127.0.0.1:44455"
    tls:
      insecure: true
processors:

  batch:
    send_batch_size: 1024

extensions:

service:
  extensions:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [opencensus]

Log output

No response

Additional context

No response

crobert-1 commented 11 months ago

Looks like a pretty simple bug from what I can tell. The config is trying to use the same 44455 port twice, resulting in the error shown. This is a test issue.

The test is attempting to get available local addresses, one for the receiver, one for the sender (exporter), and then attempts to use the result in its final running configuration. However, since it gets two available addresses independently, it simply returns the same available address twice sometimes. There's actually a comment in the code calling this out as a possibility as well. The ports aren't actually in use until the entire configuration is put together and the test bed runner is started, that's why the same port can be returned twice.

I think the simplest option is to check to make sure the data receiver's address is different than the sender when they're generated. The sender has a public property called GetEndpoint() that could be parsed to get the port, and the receiver has a Port property that could be used to check if they match. If they're the same, we could simply loop re-creating the receiver or sender until the ports no longer match.

There are some alternatives that could work as well. One would be somehow marking the port as used before it's actually used. Another option would be to plumb the first received port down the call stack, so it's not returned again by the GetAvailableLocalAddress method again. Yet another option, make the GetAvailableLocalAddress method take another argument like addressCount, where the user can specify how many available addresses they need. The method would then be able to internally check to make sure it's not returning duplicates, and return an array of addresses.

All of the alternatives end up being a lot of extra work and impact, when this is simply a test issue, that's why I think my main suggestion would make the most sense, even though it's not the most "thorough" solution.

github-actions[bot] commented 9 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

songy23 commented 9 months ago

Still seeing this happen in CI: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/7386901474/job/20094527338?pr=30241#step:7:114

dmitryax commented 6 months ago

One more: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8430916144/job/23087554817

Seems like an issue with correctness tests framework, not a particular test

crobert-1 commented 5 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8558802306/job/23454150989?pr=32173

(Some panics are hit by port in use, some from timeout. Not sure if it's the same issue or not)

crobert-1 commented 5 months ago

+1 frequency: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8559764777/job/23457347097?pr=32176

crobert-1 commented 5 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8560074298/job/23458261206?pr=32178

crobert-1 commented 5 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8562139058/job/23464810216?pr=32179

crobert-1 commented 5 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8637289278/job/23679168894?pr=32281

crobert-1 commented 5 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8694703827/job/23844225040?pr=32176

crobert-1 commented 5 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8728552275/job/23948609409?pr=32496

crobert-1 commented 5 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/8760102474/job/24044512587?pr=32571

crobert-1 commented 4 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9083924631/job/24964214890?pr=33052

crobert-1 commented 4 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9162866444/job/25190798529?pr=32833

crobert-1 commented 3 months ago

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9560618901/job/26353153698?pr=33615