square / okhttp

Square’s meticulous HTTP client for the JVM, Android, and GraalVM.
https://square.github.io/okhttp/
Apache License 2.0
45.71k stars 9.15k forks source link

Add a means to reset Postponed Routes #8315

Open hainet50b opened 5 months ago

hainet50b commented 5 months ago

We encountered an issues where Postponed Routes do not reset until a communication failure occurs for all Routes. Ref: https://github.com/square/okhttp/blob/master/okhttp/src/main/kotlin/okhttp3/internal/connection/RouteSelector.kt#L86

For example, a real scenario we encountered: consider an API Gateway that uses OkHttp and connects to an external service that runs two instances.

At a point, one of the instances temporarily stops responding, causing the corresponding Route to move to Postponed Routes. At that time, it's not a problem because all communications are processed by the other instance. But, if the other instance also stops responding, unfortunately, due to business requirements, the Read Timeout setting for the API Gateway was set to 600 secs, resulting in the service being down for 600 secs despite the fact that the other instance has recovered.

I think the load balancing mechanism of OkHttp is great. However, it can be problematic if it degrades to the last 1 Route. Could you consider enabling the reset of Postponed Routes at regular intervals, or implementing an API to reset them?

yschimke commented 5 months ago

@swankjesse any thoughts? Seems like a valid improvement to address talk issues?

swankjesse commented 5 months ago

Seems like a good idea. Maybe we should limit how much time a broken route spends in the penalty box? 5 minutes?

hainet50b commented 5 months ago

Thank you for consideration. I also think that around 5 minutes would be good.

Furthermore, it would be great if it could be configurable. In this case, setting the default value to "0" (which means infinity) or "false" might minimize confusion among existing users.