Closed shangchengying closed 2 years ago
Also, in case the failout feature needs to become (d2)service-level in future, we may want to consider leave the implementation extendible and avoid "cluster-specific" naming/assumptions? (for example, the keys in this map may include service names in future?)
Based on our current infra, we are unlikely to need d2 service level failout. Products are deployed at cluster level and it's unlikely that we will failout a service alone today. If in the future, when we do need to support that, I believe we will need additional changes as the config properties/parsing will be very different. Not sure what our timeline is for next-gen D2. We may not even get into service level failout before next-gen D2. I feel it's better to keep it simple for the moment.
Also, in case the failout feature needs to become (d2)service-level in future, we may want to consider leave the implementation extendible and avoid "cluster-specific" naming/assumptions? (for example, the keys in this map may include service names in future?)
Based on our current infra, we are unlikely to need d2 service level failout. Products are deployed at cluster level and it's unlikely that we will failout a service alone today. If in the future, when we do need to support that, I believe we will need additional changes as the config properties/parsing will be very different. Not sure what our timeline is for next-gen D2. We may not even get into service level failout before next-gen D2. I feel it's better to keep it simple for the moment.
Sure. Just wanted to mention it to give it a thought. Thanks.
In order to make our infrastructure more resilient, we'd like to have the ability to failout clusters individually so that we can mitigate more concurrent cluster level issues.
FailoutConfig
entry will be added toClusterStoreProperites
so that we can leverage existing Zookeeper watches to propagate failout signals to all clients watching for the cluster.LoadBalancerClusterListener
will be registered toSimpleLoadBalancerState
to watch for the failout property changes byFailoutConfigProvider
.FailedoutClusterManager
which handles adding registering watches to the peer clusters and warm up connections.ClusterInfoProvider
so thatD2Client
s can read the configs and perform reroutes.D2ClientDelegator
,FailoutClient
, is created to handle the re-routing request of failed out clusters.FailoutRedirectStrategy
which needs to be provided as aD2ClientConfig
.Change has been tested end to end against a sandbox services.