This means that if a user of the API wants to register watches on N resources, it has to invoke the API N times. This is a common scenario in many use-cases (examples include xDS resolver, CDS LB policy, xDS enabled gRPC server etc).
So, if we have a case where an entity watching an LDS resource receives a response that contains N RDS resources, it will end up making N watch API calls, one for each resource, and this will eventually result in N requests on the ADS stream. Each of these requests will contain one more resource compared to the previous one. The situation can be worsened in the case where the LDS resource pointed to K RDS resources and was changed to point to N new RDS resources. So, this would end up in K+N requests on the ADS stream.
While this behavior is suboptimal, it needs to be noted that:
there is no correctness issue here
this is still not violating the xDS transport protocol
There are a few things we can try to make this better though:
Support a transaction like semantics in the xDS client API. With this approach, a user would do the following:
Start transaction with a new API call.
Register a bunch of watches, and/or unregister a bunch of watches (with the existing APIs).
Stop the transaction with a new API call.
The xDS client will wait for the transaction to complete before sending the resources on the wire.
Cons of this approach:
This puts the onus on the caller to not perform any expensive operations including I/O during the time the transaction is open. It also puts the onus on the caller to close the transaction in time. Optionally, the xDS client could treat the transaction as advisory and if it is not closed within a pre-defined amount of time, could decide to send outstanding resource requests on the wire.
Increases the API surface of the xDS client.
The xDS client is shared between multiple entities within a single gRPC channel. It is also shared across multiple gRPC servers. One misbehaving entity can affect others in this approach.
Avoid any API changes, and limit the changes to the authority and transport packages. With this approach:
Instead of the authority sending the list of resources to the transport to send out, it will simply send a message to the transport saying something needs to be sent out.
The authority already maintains a list of resources requested by watchers
The transport already maintains a list of resources sent out on the ADS stream, i.e requested from the server
So, when the transport sees the message from the authority, it will query the authority to get the most recent list of resources to send out. If the most recent list matches what the transport has already sent out, the operation will be a no-op.
The xDS client watch API looks like this: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/client.go#L47
This means that if a user of the API wants to register watches on
N
resources, it has to invoke the APIN
times. This is a common scenario in many use-cases (examples include xDS resolver, CDS LB policy, xDS enabled gRPC server etc).The watch API implementation inside the xDS client delegates the watch to the appropriate authority here: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/authority.go#L458
This results in the
authority
asking the underlying transport to send the request out: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/authority.go#L593The implement in the
Transport
queues this request here: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/transport/transport.go#L258This is eventually processed by the
send
goroutine and the request is sent out on the wire: https://github.com/grpc/grpc-go/blob/6fa393c579a99e3ecfd52607d5cf9cc55b7d80bb/xds/internal/xdsclient/transport/transport.go#L362So, if we have a case where an entity watching an LDS resource receives a response that contains
N
RDS resources, it will end up makingN
watch API calls, one for each resource, and this will eventually result inN
requests on the ADS stream. Each of these requests will contain one more resource compared to the previous one. The situation can be worsened in the case where the LDS resource pointed toK
RDS resources and was changed to point toN
new RDS resources. So, this would end up inK+N
requests on the ADS stream.While this behavior is suboptimal, it needs to be noted that:
There are a few things we can try to make this better though:
authority
andtransport
packages. With this approach:authority
sending the list of resources to thetransport
to send out, it will simply send a message to thetransport
saying something needs to be sent out.authority
already maintains a list of resources requested by watcherstransport
already maintains a list of resources sent out on the ADS stream, i.e requested from the servertransport
sees the message from theauthority
, it will query theauthority
to get the most recent list of resources to send out. If the most recent list matches what the transport has already sent out, the operation will be a no-op.