spring-projects / spring-boot

Spring Boot
https://spring.io/projects/spring-boot
Apache License 2.0
74.55k stars 40.54k forks source link

Spring Boot Slow Startup Time in Docker when limiting CPUs #19911

Closed miller79 closed 4 years ago

miller79 commented 4 years ago

My organization is wanting to optimize the usage of our applications within a Kubernetes container by utilizing the minimum appropriate CPU for the applications running. When doing analysis, we have discovered a healthy area that the CPU allows the application to run appropriately. However the startup time for the application takes a huge hit in our environment.

Suggesting to increase CPU strictly for startup time enhancements really isn't an option we want to pursue and looking through the documents and guides we could find, there doesn't appear to be a lot of information on appropriate ways to address this problem effectively.

I would like to throw an issue up to analyze the use case of an application when restricting CPU. I have created a repo that has a bare bones Spring Boot application that we can analyze for this discussion.

We are interested in trying to create a standard document on how our developers can utilize Spring Boot effectively in Kubernetes with as thin of a CPU as possible while still having the startup time performance we would expect an application to have. Currently we are running using the cpu setting of 600m within Kubernetes which is roughly equivalent to running the docker command with --cpus=".6".

Running on my box which is an i7-7600U @ 2.8GHz with 16 GB of RAM, I am seeing roughly 30 seconds startup time with the code in my repository.

A member of our org has also created some benchmark results comparing other frameworks with similar code. Their box they ran the tests on was an i7 process @ 2.6 with 16 GB of RAM as well. The results are available here.

I would like to start a conversation that will hopefully lead to some focused documentation on this particular request. I believe this documentation could help many organizations using Spring Boot that is starting to migrate to Kubernetes. I'm not sure if it would help to also bring in the Spring Cloud Kubernetes team potentially as well but let this be the start of the discussion.

wilkinsona commented 4 years ago

I can see that there's already been some discussion with @bclozel on Gitter about this. Generally speaking, that's a better place to have a broad discussion. We can then open issues for specific things as and when a change is identified.

If you are not using it, one thing that you could do is to exclude Hibernate Validator from a web application. In Spring Boot 2.2 it's included in the web starter. We've just changed this in 2.3.0.M1 as its cost in terms of startup time and memory footprint is too high for it to be paid by default by everyone building a web app.

Your benchmark results are interesting. Thanks for sharing them. I think they show that none of the frameworks perform particularly well in such a constrained environment. For example, 10-15s to start a Graal Native image is orders of magnitude slower than I would expect to see. Startup performance is an on-going area of interest for us and you may be interested in the work that is going on in the Spring Graal Native and Spring Fu projects. We also have some other work going on behind the scenes that we're not quite ready to talk about publicly.

You may also be interested in @dsyer's How Fast is Spring talk. It contains several tips and tricks that may be of interest, particularly in a heavily constrained environment.

dsyer commented 4 years ago

I would try with Webflux and Netty. FWIW I can start a Netty app without validation in about 10 seconds on 400m CPU. With Spring Fu and Netty I can start an app about twice as fast as vanilla Spring Boot with Tomcat. Comparing Spring Boot on Tomcat with no classpath exclusions to some of those other frameworks is not really fair. Also the benchmark data look off to me because the startup time should scale roughly inversely with the cpu count (below about 2 cpus at least). That might be to do with the memory limits - it would be interesting to leave the memory unconstrained just to see if it made a difference, even if you don't do that in production. Also try CDS (J9 is the easiest way to do that, but HotSpot VMs can also be set up with a CDS cache), or just use -noverify for a quick and dirty look at what that might deliver.

miller79 commented 4 years ago

Thanks for the info @wilkinsona and @dsyer . I have updated the repo to reflect what I believe is the optimal settings (outside of JVM settings) and it has improved the startup time quite a bit from the starting project. However the times are still relatively higher than we would like to see. I am still playing with other things to see how much more time we can reduce. These include the following:

I will update with any findings I find but any additional feedback is welcome if there are additional things we could potentially look at.

dsyer commented 4 years ago

Thanks for the update.

Directly importing the autoconfiguration classes that are required

Probably could get you 10 or 20% improvement for simple apps.

Not using spring boot at all and initializing the application context manually

Not really going to help, unless you end up with fewer features (see above).

Using the spring-boot-thin-layout to reduce the number of used dependencies.

Not sure what you mean here. The dependencies are the dependencies. You either use them or you don't - it has nothing to do with the JAR layout.

Playing with Spring Fu and Spring Graal Native projects

Both will have an impact (Graal Native a big impact). Both are limited in scope, and are still experimental. But please do play around and see if it works for you. Feedback more than welcome.

dsyer commented 4 years ago

You should also add (proxyBeanMethods=false) to all your @Configuration and @SpringBootApplication annotations.

miller79 commented 4 years ago

Just wanted to update this issue with a bit more information that I has been discovered. Our organization uses Spring Data JPA in most applications. From the application I have provided as a benchmark, after adding just the following:

pom.xml changes

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>com.microsoft.sqlserver</groupId>
            <artifactId>mssql-jdbc</artifactId>
        </dependency>
        <dependency>
            <groupId>com.microsoft.azure</groupId>
            <artifactId>adal4j</artifactId>
            <version>1.6.4</version>
        </dependency>

application.yml changes

spring:
  main:
    lazy-initialization: true
  datasource:
    url: Azure SQL Server Jdbc URL
    username: user for the URL
    password: password for the URL

Adding this alone and starting the application I've observed the following behaviors when using the cpu limit of .4.

I have tried this same configuration when using spring-boot-starter-jdbc and have observed that Hikari does not start up. The culprit of the startup seems to be within the HibernateJpaAutoConfiguration as disabling this AutoConfig with the first dependency does not startup Hibernate or Hikari.

What I'm working on next specifically is trying out Tomcat JDBC and DBCP 2 to see if the startup is any faster using these other frameworks.

I have provided the logs below to show what happened when adding what I showed above.

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.2.4.RELEASE)

2020-03-06 17:11:59.778  INFO 1 --- [           main] miller79.Application                     : Starting Application on 348f436e0a54 with PID 1 (/app started by root in /)
2020-03-06 17:11:59.855  INFO 1 --- [           main] miller79.Application                     : No active profile set, falling back to default profiles: default
2020-03-06 17:12:14.154  INFO 1 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2020-03-06 17:12:14.657  INFO 1 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 399ms. Found 0 JPA repository interfaces.
2020-03-06 17:12:16.761  INFO 1 --- [           main] ptablePropertiesBeanFactoryPostProcessor : Post-processing PropertySource instances
2020-03-06 17:12:17.554  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource configurationProperties [org.springframework.boot.context.properties.source.ConfigurationPropertySourcesPropertySource] to AOP Proxy
2020-03-06 17:12:17.555  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource systemProperties [org.springframework.core.env.PropertiesPropertySource] to EncryptableMapPropertySourceWrapper
2020-03-06 17:12:17.555  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource systemEnvironment [org.springframework.boot.env.SystemEnvironmentPropertySourceEnvironmentPostProcessor$OriginAwareSystemEnvironmentPropertySource] to EncryptableMapPropertySourceWrapper
2020-03-06 17:12:17.556  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource random [org.springframework.boot.env.RandomValuePropertySource] to EncryptablePropertySourceWrapper
2020-03-06 17:12:17.556  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource applicationConfig: [classpath:/application.yml] [org.springframework.boot.env.OriginTrackedMapPropertySource] to EncryptableMapPropertySourceWrapper
2020-03-06 17:12:20.871  INFO 1 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-03-06 17:12:29.169  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2020-03-06 17:12:38.857  INFO 1 --- [pool-2-thread-1] com.microsoft.aad.adal4j.WSTrustRequest  : [Correlation ID: null] Sent (null) Correlation Id is not same as received (e46ad1c8-9fa9-4665-bcec-dbf3ed2c0743).
2020-03-06 17:12:40.061  INFO 1 --- [pool-2-thread-1] c.microsoft.aad.adal4j.WSTrustResponse   : Found token of type: urn:oasis:names:tc:SAML:1.0:assertion
2020-03-06 17:12:40.157  INFO 1 --- [pool-2-thread-1] c.m.aad.adal4j.AuthenticationAuthority   : [Correlation ID: 69150f17-0f9f-47fd-88d0-7436dfb8f21f] Instance discovery was successful
2020-03-06 17:12:43.459  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2020-03-06 17:12:43.868  INFO 1 --- [           main] o.hibernate.jpa.internal.util.LogHelper  : HHH000204: Processing PersistenceUnitInfo [name: default]
2020-03-06 17:12:44.273  INFO 1 --- [           main] org.hibernate.Version                    : HHH000412: Hibernate Core {5.4.10.Final}
2020-03-06 17:12:45.276  INFO 1 --- [           main] o.hibernate.annotations.common.Version   : HCANN000001: Hibernate Commons Annotations {5.1.0.Final}
2020-03-06 17:12:46.456  INFO 1 --- [pool-3-thread-1] com.microsoft.aad.adal4j.WSTrustRequest  : [Correlation ID: null] Sent (null) Correlation Id is not same as received (28efd158-5197-4729-9be5-2439caa04935).
2020-03-06 17:12:46.672  INFO 1 --- [pool-3-thread-1] c.microsoft.aad.adal4j.WSTrustResponse   : Found token of type: urn:oasis:names:tc:SAML:1.0:assertion
2020-03-06 17:12:46.756  INFO 1 --- [pool-3-thread-1] c.m.aad.adal4j.AuthenticationAuthority   : [Correlation ID: bbf99753-10b5-4a7f-9f4c-f4c1f334545a] Instance discovery was successful
2020-03-06 17:12:46.872  INFO 1 --- [           main] org.hibernate.dialect.Dialect            : HHH000400: Using dialect: org.hibernate.dialect.SQLServer2012Dialect
2020-03-06 17:12:51.769  INFO 1 --- [pool-4-thread-1] com.microsoft.aad.adal4j.WSTrustRequest  : [Correlation ID: null] Sent (null) Correlation Id is not same as received (3b21bba9-95ee-4dfc-8f01-84713d39f363).
2020-03-06 17:12:52.056  INFO 1 --- [pool-4-thread-1] c.microsoft.aad.adal4j.WSTrustResponse   : Found token of type: urn:oasis:names:tc:SAML:1.0:assertion
2020-03-06 17:12:52.062  INFO 1 --- [           main] o.h.e.t.j.p.i.JtaPlatformInitiator       : HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]
2020-03-06 17:12:52.066  INFO 1 --- [pool-4-thread-1] c.m.aad.adal4j.AuthenticationAuthority   : [Correlation ID: 38d2b119-a38b-48ad-855b-8384578d5fed] Instance discovery was successful
2020-03-06 17:12:52.163  INFO 1 --- [           main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2020-03-06 17:12:55.286  INFO 1 --- [pool-5-thread-1] com.microsoft.aad.adal4j.WSTrustRequest  : [Correlation ID: null] Sent (null) Correlation Id is not same as received (c88139f7-0c18-4caf-8a18-0f3e85deed66).
2020-03-06 17:12:55.356  INFO 1 --- [pool-5-thread-1] c.microsoft.aad.adal4j.WSTrustResponse   : Found token of type: urn:oasis:names:tc:SAML:1.0:assertion
2020-03-06 17:12:55.362  INFO 1 --- [pool-5-thread-1] c.m.aad.adal4j.AuthenticationAuthority   : [Correlation ID: 595b6179-e3a7-4ae9-a93f-aa9dd4ae50d8] Instance discovery was successful
2020-03-06 17:12:56.464  INFO 1 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port(s): 8080
2020-03-06 17:12:56.571  INFO 1 --- [           main] miller79.Application                     : Started Application in 63.489 seconds (JVM running for 68.467)
dsyer commented 4 years ago

The database connection of implementation won’t make any difference. For JPA you need the additional spring data lazy flag, i.e. spring.data.jpa.repositories.bootstrap-mode=lazy.

miller79 commented 4 years ago

I tried that and did not see much improvement. I didn't add any specific repositories as well so I'm not sure that setting would change a lot from that side. Here are the logs.

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.2.4.RELEASE)

2020-03-06 18:13:59.576  INFO 1 --- [           main] miller79.Application                     : Starting Application on b9ae005ef089 with PID 1 (/app started by root in /)
2020-03-06 18:13:59.660  INFO 1 --- [           main] miller79.Application                     : No active profile set, falling back to default profiles: default
2020-03-06 18:14:13.858  INFO 1 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in LAZY mode.
2020-03-06 18:14:14.161  INFO 1 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 107ms. Found 0 JPA repository interfaces.
2020-03-06 18:14:17.573  INFO 1 --- [           main] ptablePropertiesBeanFactoryPostProcessor : Post-processing PropertySource instances
2020-03-06 18:14:18.772  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource configurationProperties [org.springframework.boot.context.properties.source.ConfigurationPropertySourcesPropertySource] to AOP Proxy
2020-03-06 18:14:18.772  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource systemProperties [org.springframework.core.env.PropertiesPropertySource] to EncryptableMapPropertySourceWrapper
2020-03-06 18:14:18.773  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource systemEnvironment [org.springframework.boot.env.SystemEnvironmentPropertySourceEnvironmentPostProcessor$OriginAwareSystemEnvironmentPropertySource] to EncryptableMapPropertySourceWrapper
2020-03-06 18:14:18.774  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource random [org.springframework.boot.env.RandomValuePropertySource] to EncryptablePropertySourceWrapper
2020-03-06 18:14:18.775  INFO 1 --- [           main] c.u.j.EncryptablePropertySourceConverter : Converting PropertySource applicationConfig: [classpath:/application.yml] [org.springframework.boot.env.OriginTrackedMapPropertySource] to EncryptableMapPropertySourceWrapper
2020-03-06 18:14:20.267  INFO 1 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2020-03-06 18:14:29.861  INFO 1 --- [           main] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'applicationTaskExecutor'
2020-03-06 18:14:30.059  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2020-03-06 18:14:41.567  INFO 1 --- [pool-2-thread-1] com.microsoft.aad.adal4j.WSTrustRequest  : [Correlation ID: null] Sent (null) Correlation Id is not same as received (32213262-7f30-48c9-a471-dc76dffd84c7).
2020-03-06 18:14:43.059  INFO 1 --- [pool-2-thread-1] c.microsoft.aad.adal4j.WSTrustResponse   : Found token of type: urn:oasis:names:tc:SAML:1.0:assertion
2020-03-06 18:14:43.360  INFO 1 --- [pool-2-thread-1] c.m.aad.adal4j.AuthenticationAuthority   : [Correlation ID: 55df68c4-bcae-42db-bdee-9d2d0ceba0cb] Instance discovery was successful
2020-03-06 18:14:47.275  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2020-03-06 18:14:49.370  INFO 1 --- [         task-1] o.hibernate.jpa.internal.util.LogHelper  : HHH000204: Processing PersistenceUnitInfo [name: default]
2020-03-06 18:14:50.059  INFO 1 --- [pool-3-thread-1] com.microsoft.aad.adal4j.WSTrustRequest  : [Correlation ID: null] Sent (null) Correlation Id is not same as received (e0f32846-8b80-41f9-a593-8624f01331b9).
2020-03-06 18:14:50.460  INFO 1 --- [pool-3-thread-1] c.microsoft.aad.adal4j.WSTrustResponse   : Found token of type: urn:oasis:names:tc:SAML:1.0:assertion
2020-03-06 18:14:50.656  INFO 1 --- [pool-3-thread-1] c.m.aad.adal4j.AuthenticationAuthority   : [Correlation ID: f614bfa4-efc9-423a-ab3d-31f0e2d8aaae] Instance discovery was successful
2020-03-06 18:14:52.059  INFO 1 --- [         task-1] org.hibernate.Version                    : HHH000412: Hibernate Core {5.4.10.Final}
2020-03-06 18:14:54.755  INFO 1 --- [pool-4-thread-1] com.microsoft.aad.adal4j.WSTrustRequest  : [Correlation ID: null] Sent (null) Correlation Id is not same as received (3c4ff189-1cab-4d6e-b0f4-cf61e50d45f4).
2020-03-06 18:14:54.958  INFO 1 --- [pool-4-thread-1] c.microsoft.aad.adal4j.WSTrustResponse   : Found token of type: urn:oasis:names:tc:SAML:1.0:assertion
2020-03-06 18:14:54.969  INFO 1 --- [pool-4-thread-1] c.m.aad.adal4j.AuthenticationAuthority   : [Correlation ID: 9ad0cd58-aadf-442b-a0dd-ec60209f463e] Instance discovery was successful
2020-03-06 18:14:56.558  INFO 1 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port(s): 8080
2020-03-06 18:14:56.574  INFO 1 --- [           main] miller79.Application                     : Started Application in 61.295 seconds (JVM running for 66.547)
miller79 commented 4 years ago

I have updated the benchmark project to have a benchmark specific to data-jpa here. It uses h2. From the tests including only spring-boot-starter-data-jpa has increased the starting time at .4 CPUs around 20 seconds. The h2 Hikari connection seems to have taken only 2 seconds. It looks like Azure connections just take a bit with the lower CPU adding another 20 seconds from my testing but we can ignore that I believe and focus more on the data jpa piece.

miller79 commented 4 years ago

One more update with this, I have completed a rough draft of the repo in a way that is workable to play with various other configs and have also ran a benchmark within AKS. If you are interested in those numbers as it is currently, they are available here.

@wilkinsona I'm also interested in some of the work you have going behind the scenes as you mentioned. If we can help at all I'd be interested.

Any other suggestions are also welcome if there is anything I may have missed in the benchmark repo from an optimization standpoint.

miller79 commented 4 years ago

I am going to close this issue as I feel this is as tuned as the application can be with the strict CPU limits provided. Looking at the focus on containers happening in 2.3.x Spring Boot and with Build Packs and layers, I believe there is no "issue" per say. I will continue tuning discussions outside of this issue. I will keep the benchmark repository updated as I find anything else out. We are not broken (or even crippled) at this point.

Thanks everyone that has been involved with your suggestions and feedback. Feel free to reach out to me if anyone has anything other information you would like to share or if you have any questions for me.

relgames commented 3 years ago

Hi @miller79 , we also experience this issue. Not sure why it was closed, it's clearly not fixed. Can you reopen? Thanks!!

miller79 commented 3 years ago

@relgames this issue was targeting that there was something within the framework causing this issue. The reality is the issue is a mix of kubernetes, the JVM, and the size of the spring framework. The most straightforward way around this without dropping java is to explore GraalVM with Spring which has growing support (https://github.com/spring-projects-experimental/spring-graalvm-native). Any other solution is going to be sub optimization which will never scale long term.