huaweicloud / spring-cloud-huawei

Spring Cloud Huawei is a framework that makes it easier and productive to develop microservices with Spring Cloud.
https://github.com/huaweicloud/spring-cloud-huawei/wiki
Apache License 2.0
516 stars 221 forks source link

网关/调用者使用已经下线的实例访问导致错误 #690

Closed liubao68 closed 2 years ago

liubao68 commented 2 years ago

问题发生场景:

  1. 启动网关/调用者, 和提供者, 调用正常
  2. 停止提供者,并且从服务中心删除提供者服务信息。重新启动服务提供者。

这个场景下,后续网关/调用者会持续访问已经下线的实例,导致错误。

影响: 这个问题影响使用Ribbon的场景(Hoxton, Greenwich、Finckley等),对于2020.0.x版本则不受影响。

原因: 在RIbbon场景下, Ribbon会缓存实例信息。 当服务被删除的时候, 注册发现客户端不会定期更新已经删除的实例信息,除非有地方触发该服务的访问。 在Ribbon场景下, 由于Ribbon已经缓存了实例,不会调用 DiscoveryClient的接口发现实例,所以就不会触发获取这个服务的实例信息,一直使用老版本。

错误日志:

2022-05-19 07:45:15.094 [service-center-discovery-task] ERROR o.a.s.s.center.client.ServiceCenterRegistration - find service xxx#xxxx instance failed.
org.apache.servicecomb.service.center.client.exception.OperationException: get service instances list fails, statusCode = 400; message = Bad Request; content = {"errorCode":"400012","errorMessage":"Micro-service does not exist","detail":"Consumer[00d63f45268a863816ac37d5adce4b99dab76a32][development/api-zuul-dev/api-zuul-dev/0.0.1] find provider[development/api-zuul-dev/logistics] failed, provider does not exist"}
    at org.apache.servicecomb.service.center.client.ServiceCenterClient.findMicroserviceInstance(ServiceCenterClient.java:250)
    at org.apache.servicecomb.service.center.client.ServiceCenterDiscovery.pullInstance(ServiceCenterDiscovery.java:157)
    at org.apache.servicecomb.service.center.client.ServiceCenterDiscovery.lambda$pullAllInstance$2(ServiceCenterDiscovery.java:210)
    at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
    at org.apache.servicecomb.service.center.client.ServiceCenterDiscovery.pullAllInstance(ServiceCenterDiscovery.java:209)
    at org.apache.servicecomb.service.center.client.ServiceCenterDiscovery.access$000(ServiceCenterDiscovery.java:40)
    at org.apache.servicecomb.service.center.client.ServiceCenterDiscovery$PullInstanceTask.execute(ServiceCenterDiscovery.java:202)
    at org.apache.servicecomb.http.client.task.AbstractTask.lambda$startTask$1(AbstractTask.java:89)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2022-05-19 07:55:19.826 [reactor-http-epoll-1] ERROR o.s.b.a.w.r.error.AbstractErrorWebExceptionHandler - [bd4961b8-15590]  500 Server Error for HTTP GET "/logistics/doc.html"
io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: No route to host: /172.17.1.180:8095
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
    |_ checkpoint ⇢ org.springframework.web.cors.reactive.CorsWebFilter [DefaultWebFilterChain]
    |_ checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
    |_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
    |_ checkpoint ⇢ HTTP GET "/logistics/doc.html" [ExceptionHandlingWebHandler]
Stack trace:
Caused by: java.net.ConnectException: finishConnect(..) failed: No route to host
    at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
    at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:672)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:649)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:529)
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

另外可以观察到发生异常后不再有实例变更的日志打印:

2022-05-19 07:55:03.275 [boundedElastic-1356] INFO  o.a.s.s.center.client.ServiceCenterRegistration - Instance changed event, current: revision=bb8257ab87f2e2cb1009cd31af4a5392f6653c60, instances=rest://172.17.2.172:8095|logistics-test|#; origin: revision=null, instances=; appId=api-zuul-dev, serviceName=logistics-test
deaml commented 1 year ago

@liubao68 咨询您一下 greenwich版本能修复此问题吗?

liubao68 commented 1 year ago

@deaml https://github.com/huaweicloud/spring-cloud-huawei/releases/tag/1.6.3-Greenwich 修复了该问题

deaml commented 1 year ago

谢谢您