grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC
https://grpc.io
Apache License 2.0
20.89k stars 4.34k forks source link

xdsclient: reduce the number of ADS requests #7473

Open easwars opened 1 month ago

easwars commented 1 month ago

The xDS client watch API looks like this: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/client.go#L47

This means that if a user of the API wants to register watches on N resources, it has to invoke the API N times. This is a common scenario in many use-cases (examples include xDS resolver, CDS LB policy, xDS enabled gRPC server etc).

The watch API implementation inside the xDS client delegates the watch to the appropriate authority here: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/authority.go#L458

This results in the authority asking the underlying transport to send the request out: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/authority.go#L593

The implement in the Transport queues this request here: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/transport/transport.go#L258

This is eventually processed by the send goroutine and the request is sent out on the wire: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/transport/transport.go#L362

So, if we have a case where an entity watching an LDS resource receives a response that contains N RDS resources, it will end up making N watch API calls, one for each resource, and this will eventually result in N requests on the ADS stream. Each of these requests will contain one more resource compared to the previous one. The situation can be worsened in the case where the LDS resource pointed to K RDS resources and was changed to point to N new RDS resources. So, this would end up in K+N requests on the ADS stream.

While this behavior is suboptimal, it needs to be noted that:

There are a few things we can try to make this better though:

dfawley commented 1 week ago

This should get fixed by @easwars's upcoming refactoring.