spring-cloud / spring-cloud-gateway

An API Gateway built on Spring Framework and Spring Boot providing routing and more.
http://cloud.spring.io
Apache License 2.0
4.49k stars 3.3k forks source link

SCG: failing to get routing SERVICE UNAVAILABLE 503 initially after first time deployment/restart of SCG #2608

Open karnamgit opened 2 years ago

karnamgit commented 2 years ago

Environemnt Spring boot: 2.6.4 spring cloud : 2021.0.1 discovery : consul

running in reactive mode.

Load test We deployed upgraded SCG and trying to perform the Load test with 100 users , and initial time after first time deployment/restart of SCG we see below exceptions and users dropping . In dyna we are capturing these . And these are coming after 30s timeout ... seems like spring cloud gateway is not able to get route within 30s of response-time so it is timing out and seeing 5XX service unavailable.

100 % likely failed due to an org.springframework.cloud.gateway.support.NotFoundException 100 % likely failed due to a reactor.core.publisher.FluxOnAssembly$OnAssemblyException 33.3 % likely failed due to a com.ecwid.consul.v1.OperationException 33.3 % likely failed due to failed service calls to consul on port 8500 with HTTP 500 18.2 % likely failed due to an org.springframework.core.env.MissingRequiredPropertiesException 6.06 % likely failed due to a reactor.core.Exceptions$StaticThrowable 6.06 % likely failed due to a reactor.core.Exceptions$StaticRejectedExecutionException

further 2nd load test runs after 10 or 15 mins with same 100 users are not showing these errors...

@spencergibb could you pls help here.

thanks Karnam

spencergibb commented 2 years ago

I'd need quite a bit more information than that. You configuration, the full exceptions and how to reproduce to begin with

karnamgit commented 2 years ago

@spencergibb Thanks for the reply .

Here is the bootstrap.yml configuration:

server: 
  port: 8443
  ssl:
    key-store: ${application.ssl.keystore}
    key-store-password: ${application.ssl.keystore.password}
    key-store-type: ${application.ssl.keystore.type}
    client-auth: ${application.ssl.keystore.type.one.way}
    enabled: ${application.ssl.enabled}
    key-store-provider: ${application.ssl.keystore.provider}
    trust-store: ${application.ssl.truststore}
    trust-store-password: ${application.ssl.truststore.password}
    enabled-protocols:  [TLSv1.2]
    protocol: TLS

management:
  trace:
    http:
      include: REQUEST_HEADERS, RESPONSE_HEADERS, PRINCIPAL, REMOTE_ADDRESS, TIME_TAKEN
  health:
    db:
      enabled: false 
  endpoints:
    enabled-by-default: false
    web:
      exposure:
        include: gateway
  endpoint:
    health:
      enabled: true     
    threaddump:
      enabled: true
    metrics:
      enabled: true
    gateway:
      enabled: true

hystrix:
  command:
    default:
      execution:
        timeout:
          enabled: false
logging:
  level:
    reactor:
      netty: TRACE
    org:
      springframework:
        cloud:
          gateway: TRACE
spring:
  cloud:
    gateway:
      httpclient:
        wiretap: true
      httpserver:
        wiretap: true      

---

spring:
  application:
    name: test-internal-edgeproxy
  profiles:
    active: commons
  main: 
    allow-bean-definition-overriding: true
  cloud:
    gateway:
      discovery:
        locator:
          enabled: true
      httpclient:
        connect-timeout: 90000
        response-timeout: 90000
    consul: 
      host: localhost
      port: 8500
      discovery:
        enabled: true
        health-check-interval: 30s
        scheme: https
        register-health-check: true
    config: 
      enabled: true
      format: ${config.server.format}
      failFast: true
      uri: https://testserver:8003

As I mentioned , we have hundreds of microservices behind the edge proxy (SCG) , and all requests goes thru edge proxy . with latest scg upgrade , we have deployed in SIT and we dont see this issue. But in PT load test we see some discovery issues when service requested , with multiple users (100,1000 users). This is occurring mostly after first time deployment and further tests are not generating more 5xx errors. below are exceptions few exceptions .

1. 
 ERROR Error getting instances from Consul.
com.ecwid.consul.v1.OperationException: OperationException(statusCode=500, statusMessage='Internal Server Error', statusContent='rpc error making call: unexpected EOF')
    at com.ecwid.consul.v1.health.HealthConsulClient.getHealthServices(HealthConsulClient.java:128)
    at com.ecwid.consul.v1.ConsulClient.getHealthServices(ConsulClient.java:571)
    at org.springframework.cloud.consul.discovery.reactive.ConsulReactiveDiscoveryClient.getHealthServices(ConsulReactiveDiscoveryClient.java:88)
    at org.springframework.cloud.consul.discovery.reactive.ConsulReactiveDiscoveryClient.lambda$getInstances$0(ConsulReactiveDiscoveryClient.java:68)
    at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:46)
    at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
    at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)
    at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
    at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
2.  
    DEBUG [3ca46235-491] Resolved [NotFoundException: 503 SERVICE_UNAVAILABLE "Unable to find instance  
Exception:
org.springframework.cloud.gateway.support.NotFoundException
Message:
503 SERVICE_UNAVAILABLE "Unable to find instance for test-security"
Stacktrace:
org.springframework.cloud.gateway.filter.ReactiveLoadBalancerClientFilter.lambda$
org.springframework.cloud.gateway.filter.ReactiveLoadBalancerClientFilter$$Lambda$.accept
reactor.core.publisher.FluxPeek$PeekSubscriber.onNext(FluxPeek.java:185)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:98)
reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:44)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drainAsync(FluxFlattenIterable.java:421)
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drain(FluxFlattenIterable.java:686)
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.onNext(FluxFlattenIterable.java:250)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)

Exception:org.springframework.web.client.HttpServerErrorException$ServiceUnavailableMessage:503 Service Unavailable: [{"timestamp":"*****T*****+00:00","path":"/test1/v1/checkProcess","status":503,"error":"Service Unavailable","requestId":"*****"}]Show affected entry in 'Code level' tabStacktrace:

org.springframework.web.client.HttpServerErrorException.create(HttpServerErrorException.java:112)

org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:172)

org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:112)
3 .Exception:
reactor.core.publisher.FluxOnAssembly$OnAssemblyException
Stacktrace:
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.fail(FluxOnAssembly.java:578)
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onError(FluxOnAssembly.java:544)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:172)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onError(FluxHide.java:142)
reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:192)
reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:259)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onError(MonoIgnoreThen.java:270)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onError(FluxDoFinally.java:136)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run

trying to replicate the same scenarios in local but not able to .

My suspect is that , gateway is waiting up to 30s from consul agent response and timed out , and leads to 503 service unavailable but not all the time. but if you see we have kept it for 90 mins .

cloud:
    gateway:
      discovery:
        locator:
          enabled: true
      httpclient:
        connect-timeout: 90000
        response-timeout: 90s

and also we see 404 service not found errors .

pls do let me know if any.

Thanks, Karnam

Srishtigupta552 commented 1 month ago

Hi @karnamgit did you get any solution for this problem actually I am facing same issue can you provide solution.