Open songy23 opened 12 months ago
Looks like a pretty simple bug from what I can tell. The config is trying to use the same 44455
port twice, resulting in the error shown. This is a test issue.
The test is attempting to get available local addresses, one for the receiver, one for the sender (exporter), and then attempts to use the result in its final running configuration. However, since it gets two available addresses independently, it simply returns the same available address twice sometimes. There's actually a comment in the code calling this out as a possibility as well. The ports aren't actually in use until the entire configuration is put together and the test bed runner is started, that's why the same port can be returned twice.
I think the simplest option is to check to make sure the data receiver's address is different than the sender when they're generated. The sender has a public property called GetEndpoint()
that could be parsed to get the port, and the receiver has a Port
property that could be used to check if they match. If they're the same, we could simply loop re-creating the receiver or sender until the ports no longer match.
There are some alternatives that could work as well. One would be somehow marking the port as used before it's actually used. Another option would be to plumb the first received port down the call stack, so it's not returned again by the GetAvailableLocalAddress method again. Yet another option, make the GetAvailableLocalAddress method take another argument like addressCount
, where the user can specify how many available addresses they need. The method would then be able to internally check to make sure it's not returning duplicates, and return an array of addresses.
All of the alternatives end up being a lot of extra work and impact, when this is simply a test issue, that's why I think my main suggestion would make the most sense, even though it's not the most "thorough" solution.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Still seeing this happen in CI: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/7386901474/job/20094527338?pr=30241#step:7:114
Seems like an issue with correctness tests framework, not a particular test
(Some panics are hit by port in use, some from timeout. Not sure if it's the same issue or not)
Component(s)
testbed
What happened?
See https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6381486879/job/17318149920?pr=27291:
Collector version
mainline
Environment information
No response
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response