Open keeganwitt opened 1 month ago
Actually, in the case of Kubernetes, even for the agents in a daemonset communicating to the downstream server, A/AAA records will be typical rather than SRV records (see here).
I'm thinking the fix for this would be to add an option to the agent config to turn on/off the gRPC load balancing.
@keeganwitt There's #4990 to make that configuration more configurable. We've agreed on allowing more options than the default, maybe another option would be no configuration.
Thank you @keeganwitt for opening this, and thank you @sorindumitru for pointing to the issue that this depends on. This depends on #4990 to be able to fix it.
Using
grpc.WithDefaultServiceConfig(roundRobinServiceConfig)
results in SRV DNS lookups. I believe this is because GRPC's code here will attempt to populate addresses to load balance between with both the SRV and A records ifEnableSRVLookups
istrue
, which it will be because the grpclb package is initialized.However, when using an external load balancer (such as aws-load-balancer) and external DNS so that the agents collocated in the same pod as your downstream server can access the upstream server, it should not be using SRV records, but should instead be using A records, as those are the type AWS will create. This results in failed DNS lookups and excessive load on your DNS system. If you generate enough of these NXDOMAIN queries, there can be a significant expense in Route53.
The usage of this load balancer was introduced in #1061.