spring-cloud / spring-cloud-gateway

An API Gateway built on Spring Framework and Spring Boot providing routing and more.
http://cloud.spring.io
Apache License 2.0
4.54k stars 3.33k forks source link

Performance Regression from 2.2.4 #1956

Closed tony-clarke-amdocs closed 4 years ago

tony-clarke-amdocs commented 4 years ago

Before adopting Spring Cloud Gateway we run a load test to make sure all is good. Recently, when trying to adopt SCG 2.2.4 (and also SCG 2.2.5), this test began to fail. The load test uses Gatling to drive the load/requests.

20 minute duration 72 users/clients Ramp 5 users/sec Pace is 100 ms

The load test runs successfully with Spring Cloud Gateway 2.2.3. Starting with Spring Cloud Gateway 2.2.4, the load test fails with timeouts on the client like the following:

io.gatling.http.client.impl.RequestTimeoutException: Request timeout to apigw-nft.ilfnm108.eaas.amdocs.com/10.238.136.244:443 after 60000 ms

There are no errors in the gateway when the timeouts occur.

We compared the dependent component versions between Spring Cloud Gateway 2.2.3 and 2.2.4. To pinpoint the cause of the regression, we used Spring Cloud Gateway 2.2.3 and one by one upgraded the dependent components until the timeouts occurred during the load test. We found that the regression was introduced in io.projectreactor Dysprosium-SR10, specifically reactor-netty 0.9.10. We concluded this by using Spring Cloud Gateway 2.2.4 and the following versions of reactor-netty:

0.9.8: Exceptions occurred due to the following regression introduced in this version: This release contains a regression related to opened file descriptors, v0.9.9.RELEASE was released immediately after. 0.9.9: successfully ran NFT test with no timeouts 0.9.10: timeouts occur during the load test

spencergibb commented 4 years ago

/cc @violetagg

violetagg commented 4 years ago

@tony-clarke-amdocs Any chance for a repro?

tony-clarke-amdocs commented 4 years ago

@violetagg I will work on one. It's wrapped up in some company specific infrastructure that I need to decouple from to make it standalone.

spring-projects-issues commented 4 years ago

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

tony-clarke-amdocs commented 4 years ago

Will try to get repot once I free up.

spencergibb commented 4 years ago

Thanks @tony-clarke-amdocs we know you're good for it :-)

tony-clarke-amdocs commented 4 years ago

Hi @OlgaMaciaszek @violetagg Please see this repot for a way to reproduce the problem. It is tricky to get the test to fail reliably across all hardware. But so far I have been able to get it to fail on Linux, Windows and Macbook...but each platform requires adjusting the load parameters. I have ran it 6 times on my MacBook. 3 times with 0.9.7 and 3 times with 0.9.10. All runs with 0.9.7 are fine but all runs with 0.9.10 caused a timeout on the client side to occur.

violetagg commented 4 years ago

@tony-clarke-amdocs Thanks! I'm looking at it.

violetagg commented 4 years ago

@tony-clarke-amdocs Will you be able to test the current 0.9.13.BUILD-SNAPSHOT?

tony-clarke-amdocs commented 4 years ago

@violetagg is there something you see that suggests 0.9.13.BUILD-SNAPSHOT has fixed the bug? BTW you can switch the version yourself, just set the version in gateway/pom.xml

tony-clarke-amdocs commented 4 years ago

@violetagg Tested with 0.9.13.BUILD-SNAPSHOT and the issue is still there.

violetagg commented 4 years ago

The issue in Reactor Netty is https://github.com/reactor/reactor-netty/pull/1370

@tony-clarke-amdocs Do you think you can verify this PR https://github.com/reactor/reactor-netty/pull/1371 ? I do not see anymore issues with that fix.

buhtr commented 4 years ago

Tried the same tests as I did in https://github.com/spring-cloud/spring-cloud-gateway/issues/2020 with reactor-netty 0.9.7.RELEASE and 0.9.9.RELEASE the results are merely the same. Hoxton.SR8 still has poor performance compared with Greenwich.SR6

violetagg commented 4 years ago

@buhtr

Tried the same tests as I did in #2020 with reactor-netty 0.9.7.RELEASE and 0.9.9.RELEASE the results are merely the same.

So the PR above makes the results comparable with what you have with Reactor Netty 0.9.7/0.9.9?

buhtr commented 4 years ago

@violetagg

The load test gives the following benchmark:

So there are no any performance improvements using reactor-netty 0.9.7

buhtr commented 4 years ago

The dependencies with reactor-netty 0.9.7

[INFO] Scanning for projects...
[INFO] 
[INFO] ------------------------< com.example:gateway >-------------------------
[INFO] Building gateway 0.0.1-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:3.1.2:tree (default-cli) @ gateway ---
[INFO] com.example:gateway:jar:0.0.1-SNAPSHOT
[INFO] +- org.springframework.cloud:spring-cloud-starter-gateway:jar:2.2.5.RELEASE:compile
[INFO] |  +- org.springframework.cloud:spring-cloud-starter:jar:2.2.5.RELEASE:compile
[INFO] |  |  +- org.springframework.cloud:spring-cloud-context:jar:2.2.5.RELEASE:compile
[INFO] |  |  |  \- org.springframework.security:spring-security-crypto:jar:5.3.5.RELEASE:compile
[INFO] |  |  +- org.springframework.cloud:spring-cloud-commons:jar:2.2.5.RELEASE:compile
[INFO] |  |  \- org.springframework.security:spring-security-rsa:jar:1.0.9.RELEASE:compile
[INFO] |  |     \- org.bouncycastle:bcpkix-jdk15on:jar:1.64:compile
[INFO] |  |        \- org.bouncycastle:bcprov-jdk15on:jar:1.64:compile
[INFO] |  \- org.springframework.cloud:spring-cloud-gateway-core:jar:2.2.5.RELEASE:compile
[INFO] |     +- org.springframework.boot:spring-boot-starter-validation:jar:2.3.5.RELEASE:compile
[INFO] |     |  +- org.glassfish:jakarta.el:jar:3.0.3:compile
[INFO] |     |  \- org.hibernate.validator:hibernate-validator:jar:6.1.6.Final:compile
[INFO] |     |     +- jakarta.validation:jakarta.validation-api:jar:2.0.2:compile
[INFO] |     |     +- org.jboss.logging:jboss-logging:jar:3.4.1.Final:compile
[INFO] |     |     \- com.fasterxml:classmate:jar:1.5.1:compile
[INFO] |     \- io.projectreactor.addons:reactor-extra:jar:3.3.4.RELEASE:compile
[INFO] +- org.springframework.boot:spring-boot-starter-webflux:jar:2.3.5.RELEASE:compile
[INFO] |  +- org.springframework.boot:spring-boot-starter:jar:2.3.5.RELEASE:compile
[INFO] |  |  +- org.springframework.boot:spring-boot:jar:2.3.5.RELEASE:compile
[INFO] |  |  |  \- org.springframework:spring-context:jar:5.2.10.RELEASE:compile
[INFO] |  |  |     +- org.springframework:spring-aop:jar:5.2.10.RELEASE:compile
[INFO] |  |  |     \- org.springframework:spring-expression:jar:5.2.10.RELEASE:compile
[INFO] |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.3.5.RELEASE:compile
[INFO] |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.3.5.RELEASE:compile
[INFO] |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO] |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO] |  |  |  +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.13.3:compile
[INFO] |  |  |  |  \- org.apache.logging.log4j:log4j-api:jar:2.13.3:compile
[INFO] |  |  |  \- org.slf4j:jul-to-slf4j:jar:1.7.30:compile
[INFO] |  |  +- jakarta.annotation:jakarta.annotation-api:jar:1.3.5:compile
[INFO] |  |  \- org.yaml:snakeyaml:jar:1.26:compile
[INFO] |  +- org.springframework.boot:spring-boot-starter-json:jar:2.3.5.RELEASE:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.11.3:compile
[INFO] |  |  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.11.3:compile
[INFO] |  |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.11.3:compile
[INFO] |  |  +- com.fasterxml.jackson.datatype:jackson-datatype-jdk8:jar:2.11.3:compile
[INFO] |  |  +- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:jar:2.11.3:compile
[INFO] |  |  \- com.fasterxml.jackson.module:jackson-module-parameter-names:jar:2.11.3:compile
[INFO] |  +- org.springframework.boot:spring-boot-starter-reactor-netty:jar:2.3.5.RELEASE:compile
[INFO] |  +- org.springframework:spring-web:jar:5.2.10.RELEASE:compile
[INFO] |  |  \- org.springframework:spring-beans:jar:5.2.10.RELEASE:compile
[INFO] |  +- org.springframework:spring-webflux:jar:5.2.10.RELEASE:compile
[INFO] |  \- org.synchronoss.cloud:nio-multipart-parser:jar:1.1.0:compile
[INFO] |     +- org.slf4j:slf4j-api:jar:1.7.30:compile
[INFO] |     \- org.synchronoss.cloud:nio-stream-storage:jar:1.1.3:compile
[INFO] +- io.projectreactor.netty:reactor-netty:jar:0.9.7.RELEASE:compile
[INFO] |  +- io.netty:netty-codec-http:jar:4.1.53.Final:compile
[INFO] |  |  +- io.netty:netty-common:jar:4.1.53.Final:compile
[INFO] |  |  +- io.netty:netty-buffer:jar:4.1.53.Final:compile
[INFO] |  |  +- io.netty:netty-transport:jar:4.1.53.Final:compile
[INFO] |  |  \- io.netty:netty-codec:jar:4.1.53.Final:compile
[INFO] |  +- io.netty:netty-codec-http2:jar:4.1.53.Final:compile
[INFO] |  +- io.netty:netty-handler:jar:4.1.53.Final:compile
[INFO] |  |  \- io.netty:netty-resolver:jar:4.1.53.Final:compile
[INFO] |  +- io.netty:netty-handler-proxy:jar:4.1.53.Final:compile
[INFO] |  |  \- io.netty:netty-codec-socks:jar:4.1.53.Final:compile
[INFO] |  +- io.netty:netty-transport-native-epoll:jar:linux-x86_64:4.1.53.Final:compile
[INFO] |  |  \- io.netty:netty-transport-native-unix-common:jar:4.1.53.Final:compile
[INFO] |  \- io.projectreactor:reactor-core:jar:3.3.11.RELEASE:compile
[INFO] |     \- org.reactivestreams:reactive-streams:jar:1.0.3:compile
[INFO] +- org.springframework.boot:spring-boot-starter-test:jar:2.3.5.RELEASE:test
[INFO] |  +- org.springframework.boot:spring-boot-test:jar:2.3.5.RELEASE:test
[INFO] |  +- org.springframework.boot:spring-boot-test-autoconfigure:jar:2.3.5.RELEASE:test
[INFO] |  +- com.jayway.jsonpath:json-path:jar:2.4.0:test
[INFO] |  |  \- net.minidev:json-smart:jar:2.3:test
[INFO] |  |     \- net.minidev:accessors-smart:jar:1.2:test
[INFO] |  |        \- org.ow2.asm:asm:jar:5.0.4:test
[INFO] |  +- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.3:test
[INFO] |  |  \- jakarta.activation:jakarta.activation-api:jar:1.2.2:test
[INFO] |  +- org.assertj:assertj-core:jar:3.16.1:test
[INFO] |  +- org.hamcrest:hamcrest:jar:2.2:test
[INFO] |  +- org.junit.jupiter:junit-jupiter:jar:5.6.3:test
[INFO] |  |  +- org.junit.jupiter:junit-jupiter-api:jar:5.6.3:test
[INFO] |  |  |  +- org.opentest4j:opentest4j:jar:1.2.0:test
[INFO] |  |  |  \- org.junit.platform:junit-platform-commons:jar:1.6.3:test
[INFO] |  |  +- org.junit.jupiter:junit-jupiter-params:jar:5.6.3:test
[INFO] |  |  \- org.junit.jupiter:junit-jupiter-engine:jar:5.6.3:test
[INFO] |  +- org.junit.vintage:junit-vintage-engine:jar:5.6.3:test
[INFO] |  |  +- org.apiguardian:apiguardian-api:jar:1.1.0:test
[INFO] |  |  +- org.junit.platform:junit-platform-engine:jar:1.6.3:test
[INFO] |  |  \- junit:junit:jar:4.13.1:test
[INFO] |  +- org.mockito:mockito-core:jar:3.3.3:test
[INFO] |  |  +- net.bytebuddy:byte-buddy:jar:1.10.17:test
[INFO] |  |  +- net.bytebuddy:byte-buddy-agent:jar:1.10.17:test
[INFO] |  |  \- org.objenesis:objenesis:jar:2.6:test
[INFO] |  +- org.mockito:mockito-junit-jupiter:jar:3.3.3:test
[INFO] |  +- org.skyscreamer:jsonassert:jar:1.5.0:test
[INFO] |  |  \- com.vaadin.external.google:android-json:jar:0.0.20131108.vaadin1:test
[INFO] |  +- org.springframework:spring-core:jar:5.2.10.RELEASE:compile
[INFO] |  |  \- org.springframework:spring-jcl:jar:5.2.10.RELEASE:compile
[INFO] |  +- org.springframework:spring-test:jar:5.2.10.RELEASE:test
[INFO] |  \- org.xmlunit:xmlunit-core:jar:2.7.0:test
[INFO] \- io.projectreactor:reactor-test:jar:3.3.11.RELEASE:test
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.620 s
[INFO] Finished at: 2020-11-10T12:55:22+03:00
[INFO] ------------------------------------------------------------------------
violetagg commented 4 years ago

@buhtr I think there is some misunderstanding?

This issue is related to a performance regression in Reactor Netty 0.9.10 and the PR above fixes this regression.

buhtr commented 4 years ago

@violetagg hard to say

I've done load testing for a different versions of scg in order to understand its performance. The results show that Greenwich.SR6 with default dependencies is 30% quicker then the latest version of scg. It seems odd to me. As a result i've created https://github.com/spring-cloud/spring-cloud-gateway/issues/2020 but @spencergibb closed it and marked it as duplicated by this task.

According to your answer it seems that these issues are not related. So what about 30% performance degradation is it appropriate for the latest version or is it some kind of a bug ?

violetagg commented 4 years ago

@spencergibb Is it possible that the issue that is reported by @buhtr is a different one?

tony-clarke-amdocs commented 4 years ago

I think it could be a different issue. The problem reported in this issue is that sometimes the response is never received. I will check to see if the PR fixes it soon.

violetagg commented 4 years ago

@tony-clarke-amdocs It will be really helpful if you can test that as Spring Boot releases are scheduled for this Thursday and I still have time to include that fix. If you think that a SNAPSHOT will be easier for testing I can commit that PR and thus make the fix available in the SNAPSHOT.

tony-clarke-amdocs commented 4 years ago

@violetagg Yes, adding this PR to SNAPSHOT will help. Thanks.

violetagg commented 4 years ago

@tony-clarke-amdocs 0.9.14.BUILD-SNAPSHOT is available from the spring repo https://repo.spring.io/snapshot

tony-clarke-amdocs commented 4 years ago

Unfortunately we are still seeing timeouts in our NFT test with 0.9.14-BUILD-SNAPSHOT. Three runs and all had timeouts. Reverted back to SCG 2.3.2 and it passes.

violetagg commented 4 years ago

@tony-clarke-amdocs can you check the jar that the fix is in, just to be sure that it is not a cached snapshot?

tony-clarke-amdocs commented 4 years ago

I am not testing the fix myself, but I agree, that is something I will forward on.

violetagg commented 4 years ago

You can try directly with 0.9.14.RELEASE

tony-clarke-amdocs commented 4 years ago

You were correct about a cached SNAPSHOT. We now are getting clear runs without timeouts : )

violetagg commented 4 years ago

@tony-clarke-amdocs Thanks a lot for testing it and providing the test case - it helped a lot.

tony-clarke-amdocs commented 4 years ago

@spencergibb Any idea when this fix will be made available in SCG? It's blocking us until then.

spencergibb commented 4 years ago

@tony-clarke-amdocs I don't understand. Spring Cloud doesn't manage the version of reactor or reactor netty, that is spring boot. Regardless, you can always adjust the version.

Closing as fixed in reactor-netty

violetagg commented 4 years ago

@spencergibb Any idea when this fix will be made available in SCG? It's blocking us until then.

According to the schedule below Spring Boot 2.3.6 should happen tomorrow and Reactor Dysprosium-SR14 (Reactor Netty 0.9.14.RELEASE) is in. https://github.com/spring-projects/spring-boot/milestone/191