open-telemetry / opentelemetry-java

OpenTelemetry Java SDK
https://opentelemetry.io
Apache License 2.0
1.97k stars 814 forks source link

Allow xds to be a valid endpoint for grpc #6724

Open anuragagarwal561994 opened 1 week ago

anuragagarwal561994 commented 1 week ago

Is your feature request related to a problem? Please describe. I am trying to load-balance my traces from pod to otel-collectors, for this I want to be able to use the grpc-proxyless setup with grpc-xds, but I am getting an error where the xds name resolver is not considered a valid protocol.

Describe the solution you'd like xds should be considered a valid endpoint protocol so that we can use grpc-proxyless to load-balance traffic to otel-collector

Describe alternatives you've considered I have tried shifting to http, but grpc-proxyless setup is more resilient as it provides more robust functionalities like circuitbreakers and dynamic endpoint updations.

Additional context Currently since we are using the grpc protocol, we are connecting to only one of the pods of otel-collector, we want to be able to make our infra more resileient and load-balanceable for which we would need this setup. I believe there not much should be required from otel side except for just supporting the protocol, rest I believe if the grpc-xds library is present in the classpath, we should be good to go.

jack-berg commented 3 days ago

I'm not familiar with grpc-xds. Maybe you could sketch out what you have in mind in a draft PR? I should note that our OTLP grpc based exporters don't use the grpc-java library. We recreate the protocol ourselves to reduce dependencies and increase serialization efficiency. If grpc-xds is a plugin for grpc-java, it likely won't be simple integration.

anuragagarwal561994 commented 2 days ago

Oh understood then I think we might have to go via custom exporter route if necessary.

So grpc-java can do client side load balancing as it has service discovery capabilities as it uses the xds api.

XDS is enovy's API to discover resources in a cluster dynamically https://medium.com/@rajithacharith/introduction-to-envoys-dynamic-resource-discovery-xds-protocol-d340032a63b4, but even GRPC supports it.

So there will be a control plane which will receive events to kubernetes and it will get translated to dynamic discovery like when a server pod is added / deleted and GRPC since have this information can now do client side load-balancing at its end. As of now since grpc uses http2 connection, a lot of application pods can get connected to one opentelemetry collector server.