spring-cloud / spring-cloud-netflix

Integration with Netflix OSS components
http://cloud.spring.io/spring-cloud-netflix/
Apache License 2.0
4.87k stars 2.44k forks source link

Ribbon with Spring-retry make an error renewal for ServerStats #3131

Open ywind opened 6 years ago

ywind commented 6 years ago

hi,

I'm using Feign,Ribbon, and when i import ZipKin ,the Spring-Retry would be impoted because of dependency. I think i've found a bug when Ribbon Feign work with Spring retry that the LoadBalancerCommand.java will make an error update on a Fail Server. The following is the recurrence .

POM :

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.13.RELEASE</version>
    </parent>        
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-dependencies</artifactId>
        <version>Edgware.SR4</version>
        <type>pom</type>
        <scope>import</scope>
    </dependency>

i have two applications( InterfaceApplication TopicApplication ) ,and TopicApplication is two instances work on port 8002 and 8003 .InterfaceApplication will call TopicApplication.

InterfaceApplication yml:

ribbon:
  eureka:
    enabled: true 
  ReadTimeout: 3000
  ConnectTimeout: 5000
  MaxAutoRetries: 0
  MaxAutoRetriesNextServer: 3
  OkToRetryOnAllOperations: false
feign:
  httpclient:
    enabled: true

1、start applications

image

2、make interfaceApplication call TopicApplication,first server choose is 10.2.58.222:8003

image

3、kill 10.2.58.222:8003 , interfaceApplication will retry other server 10.2.58.222:8002, request will be success

LoadBalancerCommand.java#L307

image

4、but uri will be origin server 10.2.58.222:8003 , and serverstat too. the reason i think is server and serverstat create at LoadBanceCommend.java ,but retry execute at RetryTemplate.

image

5、Finally 10.2.58.222:8003 serverstat will be update,not 10.2.58.222:8002

image

and after an error update ,the fail server successiveConnectionFailureCount will be clear ,and change CircuitBreaker status cause subsequent requests lead to a fail server.

main code link : 1、LoadBalancerCommand.java 2、AbstractLoadBalancerAwareClient 3、RetryableFeignLoadBalancer 4、RetryTemplate

ryanjbaxter commented 6 years ago

Maybe you would be interested in submitting a PR to fix the problem?

ywind commented 6 years ago

Maybe you would be interested in submitting a PR to fix the problem?

Sorry, i have been busy for a while . i think this is a more challenging bug to me, and i will think if i have a good way to fix it next week.

ywind commented 6 years ago

@ryanjbaxter i'm so sorry.It is more complex for me to fix with several jars and RxJava(im not familiar with reactive programming).

What i think is openning the ExecutionInfoContext at LoadBalancerCommand to RibbonLoadBalancedRetryPolicy by ThreadLocal etc.When retry occurred, update ExecutionInfoContext new Server, and then when request Complete or Error ,get the latest Server from ExecutionInfoContext. The main codes that need to be modify are com.netflix.loadbalancer.reactive.LoadBalancerCommand and org.springframework.cloud.netflix.ribbon.RibbonLoadBalancedRetryPolicy.

But i thik its not a good idea..because it need modify two project and maybe effect Ribbon reactive programming thread. So this issue maybe need another gay to fix.So sorry...