Open sh0rez opened 2 months ago
Keeping a list of multiple endpoints is something that would break the specification requirements for OTLP exporters. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options
Also, if we start doing that, it's a feature we're introducing to a stable component. We won't be able to remove it when/if Go fixes this and it's necessary anymore.
Using a custom round tripper/transport is also not going to be possible for now. See https://github.com/open-telemetry/opentelemetry-go/issues/2632
Disabling keep alives could be a valid option we add to the HTTP exporters clients.
@sh0rez can this be closed?
Problem Statement
I want to horizontally scale the OTel collector and have the SDK (somewhat evenly) distribute requests to collector instances.
I have a Headless Service for my collector that returns all instances when querying via DNS:
However, because the Go HTTP Client which this package uses keeps the tcp connection alive, the SDK sticks to the first ever returned address until it becomes unreachable.
This also applies to regular k8s Services, because once the tcp conn is opened, no further loadbalancing from the k8s side takes place.
There is https://github.com/golang/go/issues/34511 requesting this for the standard library, but no real progress has been made since 2019.
Proposed Solution
Instead of relying on the HTTP Client to determine the endpoint out of the DNS list, do the following:
If deemed acceptable, I am happy to contribute this functionality
Alternatives
Disable Keepalive
By disabling TCP keepalive, a new connection is made on every request, which includes a DNS lookup. I confirmed this works by mangling with SDK internals, but is inefficient.
Use custom RoundTripper
In the Go issue the use of https://github.com/CAFxX/balancer is suggested.
This however leads to a DNS lookup on every request, which is undesirable
Have users deploy server-side loadbalancers
Of course this can be fixed server-side by deploying another layer of load-balancing proxies (nginx, etc) in front of the otel collector. This greatly complicates the pipeline setup though, as one might end up with 3 layers (http loadbalancing, stateless collector for sticky otlp loadbalancing, stateful collector for processing)