linkedin / rest.li

Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs.
rest.li
Other
2.51k stars 546 forks source link

Support failing out requests to clusters #777

Closed shangchengying closed 2 years ago

shangchengying commented 2 years ago

In order to make our infrastructure more resilient, we'd like to have the ability to failout clusters individually so that we can mitigate more concurrent cluster level issues.

Change has been tested end to end against a sandbox services.

shangchengying commented 2 years ago

Also, in case the failout feature needs to become (d2)service-level in future, we may want to consider leave the implementation extendible and avoid "cluster-specific" naming/assumptions? (for example, the keys in this map may include service names in future?)

Based on our current infra, we are unlikely to need d2 service level failout. Products are deployed at cluster level and it's unlikely that we will failout a service alone today. If in the future, when we do need to support that, I believe we will need additional changes as the config properties/parsing will be very different. Not sure what our timeline is for next-gen D2. We may not even get into service level failout before next-gen D2. I feel it's better to keep it simple for the moment.

bohhyang commented 2 years ago

Also, in case the failout feature needs to become (d2)service-level in future, we may want to consider leave the implementation extendible and avoid "cluster-specific" naming/assumptions? (for example, the keys in this map may include service names in future?)

Based on our current infra, we are unlikely to need d2 service level failout. Products are deployed at cluster level and it's unlikely that we will failout a service alone today. If in the future, when we do need to support that, I believe we will need additional changes as the config properties/parsing will be very different. Not sure what our timeline is for next-gen D2. We may not even get into service level failout before next-gen D2. I feel it's better to keep it simple for the moment.

Sure. Just wanted to mention it to give it a thought. Thanks.