Spring Data JDBC performance by caching SQL statements generated for derived queries.

spring-projects / spring-data-relational

Spring Data Relational. Home of Spring Data JDBC and Spring Data R2DBC.

https://spring.io/projects/spring-data-jdbc

Apache License 2.0

774 stars 348 forks source link

Spring Data JDBC performance by caching SQL statements generated for derived queries. #1106

Open sergey-morenets opened 3 years ago

sergey-morenets commented 3 years ago

We have a Spring Data JPA (Hibernate) project that we would like to migrate to Spring Data JDBC. The main reasons are simplified configuration and model mapping. And we thought that it would lead to better performance.

However we did some benchmarks (with default settings) using JMH (H2 database) and it turned out that in most cases the performance has decreased. For example, we have the following repository and query method:

public interface ProductRepository 
    extends CrudRepository<Product, Integer> {

    Product findByName(String name);

And the benchmarks showed the following execution time (in ns): Spring Data JDBC - 80899 Spring Data JPA - 14124

So is it predictable? Or we missed something in our configuration/tests?

schauder commented 3 years ago

Please provide a Minimimal Reproducable Example, preferable as a Github repository. Make sure to include the database, either as an in memory database or if that is not possible using Testcontainers.

Currently there are various scenarios where I expect one or the other to be faster.

sergey-morenets commented 3 years ago

Hi @schauder

This is repository with simplified example: https://github.com/sergey-morenets/spring-data-benchmarks

You can open this project in IDE and run of the classes: SpringDataJdbcBenchmarking or SpringDataJpaBenchmarking to execute the benchmarks.

This is simplified project so the absolute results are different but Spring Data JPA is still faster (7016 ns) against Spring Data JDBC (39617 ns).

schauder commented 3 years ago

If I interpret that correctly you don't measure actual loads with JPA but only the lookup in the first level cache, which is probably not what you want.

Invoke EntityManager.clear() between benchmarks.

sergey-morenets commented 3 years ago

Hi @schauder

Thank you for the comment. Now I clear the entityManager 1st level cache at the beginning of the benchmark (I updated the repository):

@Benchmark
public Product springDataJpaQuery() {
        entityManager.clear();
        return productRepository.findByName("phone");
}

However it almost hasn't impacted the performance (execution time is 7479 ns)

GeorgeSalu commented 3 years ago

are there any benchmarks between spring data jdbc and mybatis?

schauder commented 3 years ago

Thanks for the reproducer.

The main difference between the two benchmarks is, that you accidentally kicked out the Hikari connection pool by explicitly constructing the datasource. you can completely remove any Spring Data JDBC configuration and you'll see a significant performance boost.

It seems, we also do not properly cache the results of constructing the SQL statement from the method name, resulting in some significant overhead. You can workaround that by providing an explicit query.
The missing caching is something we'll fix.

schauder commented 3 years ago

Results of modified benchmark.

Benchmark Mode Cnt Score Error Units SpringDataJdbcBenchmarking.springDataJdbcQuery avgt 5 10103.975 ± 1140.990 ns/op SpringDataJpaBenchmarking.springDataJpaQuery avgt 5 8622.891 ± 1886.295 ns/op

You can find the modified benchmark here: https://github.com/schauder/spring-data-benchmarks

schauder commented 3 years ago

In general I would not expect better better performance from Spring Data JDBC compared to JPA implementations in the typical benchmark scenario.

The benefit of Spring Data JDBC is that it is much easier to understand what it is actually doing and therefor easier to use correctly.

This might very well result in better performance of real world applications due to few mistakes made.

sergey-morenets commented 2 years ago

Hi @schauder

I returned to the benchmarks topic and used your project. However I noticed you'd added a @Query annotation for query method (https://github.com/schauder/spring-data-benchmarks/blob/main/src/main/java/demo/jdbc/ProductRepository.java) Was it done intentionally? Because it completely changes the query logic:

public interface ProductRepository 
    extends CrudRepository<Product, Integer> {

    @Query("select * from products")
    Product findByName(String name);    
}

After I've removed this annotation

public interface ProductRepository 
    extends CrudRepository<Product, Integer> {

    Product findByName(String name);    
}

and re-run the benchmarks the results are the following:

Benchmark                                       Mode  Cnt      Score     Error  Units
SpringDataJdbcBenchmarking.springDataJdbcQuery  avgt    5  30283.784 ± 643.013  ns/op
SpringDataJpaBenchmarking.springDataJpaQuery    avgt    5   8669.489 ± 410.739  ns/op

So there's again significant gap between Spring Data JDBC and Spring Data JPA execution time.

schauder commented 2 years ago

Was it done intentionally?

Yes and no. I intentionally added the annotation to demonstrate the effect of not caching the generated query. Changing the query semantics was a mistake on my side.

petromir commented 2 years ago

It seems, we also do not properly cache the results of constructing the SQL statement from the method name, resulting in some significant overhead. You can workaround that by providing an explicit query. The missing caching is something we'll fix.

@schauder Is this something to expect in the upcoming version 3.0?

eduanb commented 1 year ago

Is there any update on this? In Spring Data R2DBC, we've seen about a 10x performance increase by adding the @Query for this query on a very large table (50+ columns), even with a small list of ids (<10) fun findAllByIdInAndSomethingIsTrue(ids: List<Long>): Flow<MyEntity>