spring-cloud / spring-cloud-netflix

Integration with Netflix OSS components
http://cloud.spring.io/spring-cloud-netflix/
Apache License 2.0
4.87k stars 2.44k forks source link

Problems with Ribbon/Feign/Zuul retry #1577

Closed eacdy closed 7 years ago

eacdy commented 7 years ago

I'm using Spring Cloud Camden SR3, and I wonder whether am I right.

Ribbon Retry

Configuration below can make ribbon retry.

spring:
  cloud: 
    loadbalancer:
      retry: 
        enabled: true
client:
  ribbon:
    MaxAutoRetries: 3
    MaxAutoRetriesNextServer: 3
    OkToRetryOnAllOperations: true

Feign Retry

In Spring Cloud Camden SR3, Feign has its own retry logic. if I want to disable feign's retry, I can use:

@Bean
public Retryer retryer() {
    return Retryer.NEVER_RETRY;
}

Zuul Retry

zuul:
  retryable: true
ribbon:
  MaxAutoRetries: 3
  MaxAutoRetriesNextServer: 3
  OkToRetryOnAllOperations: true 

My Questions

B.T.W

I've read the post https://github.com/spring-cloud/spring-cloud-netflix/issues/1290

emas80 commented 7 years ago

Hi, I am trying to understand how the retry works with Zuul, Ribbon, Feign, on Brixton and on Camden (SR3). I would like to share what I discovered so far with Ribbon and the retry.

First of all, I would like to point out that the documentation on Spring Cloud appears to be wrong:

spring.cloud.loadbalancer.retry=true

this does not work for me, while

spring.cloud.loadbalancer.retry.enabled=true

works.

Then, I am able to specify generic ribbon properties - without specifying the client, like

ribbon.MaxAutoRetries=0
ribbon.MaxAutoRetriesNextServer=3
ribbon.OkToRetryOnAllOperations=true

I was able to simulate a connection timeout, and without retry.enabled I could see the error on the logs after about 5 secs, while after enabling it I could see it after about 5x{number_of_retry}. I can assume the call was been retried.

The number 5 is the amount of seconds I set as default timeout connect timeout when creating the RestTemplate bean into my Spring Configuration. I was not able to override that value using something like

ribbon.ConnectTimeout=1000

Does anyone know why?

The logs I see are misleading, as only one ip address (and only once) is showed - instead of the ip address(es) of the different servers. I guess only the first ip address is printed. It took me a while to realize that the retry actually was working. I still have to verify that the retry is done against a different server. I see from the logs that the client knows that there is more then one server.

Does anyone know what is the default logic for considering a call as failed? I see the connect timeout triggers a retry. Is it for any status code different than 20x? Only the 50x?

We don't use Feign, while we use Zuul (on one of the first versions of Brixton, we plan to upgrade it to Camden). It seems that enabling the retry on Zuul is not so difficult.

eacdy commented 7 years ago

@emas80 Yes, I've also found that the spring cloud doc may be wrong. spring.cloud.loadbalancer.retry.enabled = true can work, but the doc makes it spring.cloud.loadbalancer.retry = true. BUT I still wonder whether am I right with ribbon/feign/zuul's retry.

dyc87112 commented 7 years ago

The document is wrong.

@ConfigurationProperties("spring.cloud.loadbalancer.retry")
public class LoadBalancerRetryProperties {
    private boolean enabled = false;
    ... 
}

https://github.com/spring-cloud/spring-cloud-commons/pull/155

ryanjbaxter commented 7 years ago

@eacdy I have fixed the typo in the documentation thanks! https://github.com/spring-cloud/spring-cloud-commons/blob/master/docs/src/main/asciidoc/spring-cloud-commons.adoc#retrying-failed-requests

I believe your analysis is right. If you disable Feign's retry you can still use Ribbon's configuration properties to configure retry logic when Feign is using Ribbon.

ryanjbaxter commented 7 years ago

Also there is a PR to use Spring Retry when using Feign (and removed the 2 layers of retry logic present today when using Feign)