Closed jarimayenburg closed 3 years ago
Closing for now because the cause and reproduction steps aren't quite correct yet
Did some more testing today, above causes are correct after all. The combination of MySQL and a low reuse_cached_search_results_millis
does the trick.
Can you attach a thread dump taken when the system is hung?
On Tue, Aug 17, 2021 at 5:50 AM Jari Maijenburg @.***> wrote:
Reopened #2896 https://github.com/hapifhir/hapi-fhir/issues/2896.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hapifhir/hapi-fhir/issues/2896#event-5167863723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2N7HNLT7WKAOPO7TSEFT3T5IWH7ANCNFSM5CIESI4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
Here you go
After waiting for a while, Hikari starts to give timeouts
hapi-fhir-jpaserver-start | 2021-08-17 10:26:05.784 [http-nio-8080-exec-31] INFO fhirtest.access [LoggingInterceptor.java:160] ERROR - GET http://localhost:8080/fhir/Patient
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.652 [search_coord_9] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.652 [search_coord_9] ERROR o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:142] HikariPool-1 - Connection is not available, request timed out after 30000ms.
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.654 [search_coord_14] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.654 [search_coord_14] ERROR o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:142] HikariPool-1 - Connection is not available, request timed out after 30000ms.
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.655 [search_coord_12] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.655 [search_coord_12] ERROR o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:142] HikariPool-1 - Connection is not available, request timed out after 30000ms.
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.656 [search_coord_10] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.656 [search_coord_10] ERROR o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:142] HikariPool-1 - Connection is not available, request timed out after 30000ms.
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.655 [search_coord_19] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.657 [search_coord_19] ERROR o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:142] HikariPool-1 - Connection is not available, request timed out after 30000ms.
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.655 [search_coord_15] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.656 [search_coord_7] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.656 [search_coord_13] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
hapi-fhir-jpaserver-start | 2021-08-17 10:26:06.657 [search_coord_3] WARN o.h.e.jdbc.spi.SqlExceptionHelper [SqlExceptionHelper.java:137] SQL Error: 0, SQLState: null
For ease of reproduction, here's a minimal patch on the starter project.
diff --git a/docker-compose.yml b/docker-compose.yml
index fcf1f2b..4ea3d1b 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -4,10 +4,15 @@ services:
build: .
container_name: hapi-fhir-jpaserver-start
restart: on-failure
+ environment:
+ SPRING_DATASOURCE_URL: jdbc:mysql://hapi-fhir-mysql:3306/hapi?serverTimezone=Europe/Amsterdam
+ SPRING_DATASOURCE_USERNAME: admin
+ SPRING_DATASOURCE_PASSWORD: admin
+ SPRING_DATASOURCE_DRIVERCLASSNAME: com.mysql.cj.jdbc.Driver
ports:
- "8080:8080"
hapi-fhir-mysql:
- image: mysql:latest
+ image: mysql:5.7
container_name: hapi-fhir-mysql
#https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html
command: --lower_case_table_names=1
diff --git a/src/main/resources/application.yaml b/src/main/resources/application.yaml
index 1ff811a..82df865 100644
--- a/src/main/resources/application.yaml
+++ b/src/main/resources/application.yaml
@@ -108,7 +108,7 @@ hapi:
# max_binary_size: 104857600
# max_page_size: 200
# retain_cached_searches_mins: 60
-# reuse_cached_search_results_millis: 60000
+ reuse_cached_search_results_millis: 0
tester:
home:
name: Local Tester
That thread dump looks to me like threads are blocking waiting for connections from the pool. Can you try increasing your connection pool size?
Increasing the Hikari connection pool fixes this issue in the starter project, but not in our FHIR server, which is a fork of the starter project with mainly some config changes. In that project, even with a connection pool with a size of 100 the server stops responding with +/- 70 simultaneous requests. Let me figure out if I can reproduce that in the starter project with a greater pool size.
FWIW the issue at least in the starter project is that Hibernate has to compensate for the lack of native database sequences in MySQL by using a table to achieve the same thing. This causes the occasional request to require 2 connections because of the way hibernate accomplishes this, so the problem is that if your DB pool isn't bigger than your server thread pool you get what is effectively a deadlock (it won't show up as a real deadlock in any management tools, but that's basically what it is).
If your real system shows lots of threads blocked waiting on PooledOptimizer, you're hitting the same issue.
To be honest, I think we should probably put stronger guidance against using MySQL on the starter project - its performance is lousy compared to Postgres for all kinds of reasons including this one.
Thanks for the info. Our real system does indeed show lots of threads waiting on PooledOptimizer, so it's the same issue.
I just tried Postgres for our project and that's performing like a champ. I'll look into migrating our system to Postgres using something like pgloader.
I agree with the stronger guidance against MySQL and in favor of something like PostgreSQL, especially for production scenarios.
One thing to note - If you migrate your data to PG it will work, but the one thing you'll need to do manually is to set the DB sequences to the same values from the MySQL ID generation table.
Ok thanks for the heads-up. Do you mean the values in the seq_* tables? Aren't those values copied to the Postgres DB in the migration itself?
Yup, those are the ones. As far as I know those won't auto migrate, only because they are a different kind of thing across the platforms (tables in MySQL vs sequences in Postgres)
On Tue, Aug 17, 2021 at 11:57 AM Jari Maijenburg @.***> wrote:
Ok thanks for the heads-up. Do you mean the values in the seq_* tables? Aren't those values copied to the Postgres DB in the migration itself?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hapifhir/hapi-fhir/issues/2896#issuecomment-900421392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2N7HOIVQGWFOMKGMX52CDT5KBH5ANCNFSM5CIESI4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
@jamesagnew
Thank you for this discussion. I decided to use PostgreSQL instead of MySQL. Is there any versions of PostgreSQL which are currently supported by JPA server?
Describe the bug When configuring the JPA server to not use the search cache (
reuse_cached_search_results_millis: 0
in the starter project) or a search cache with relatively small timeout (reuse_cached_search_results_millis: 10000
), the server stops responding with a load of a certain amount of parallel requests.To Reproduce Steps to reproduce the behavior:
reuse_cached_search_results_millis
to a value of 10000 or below (I used 0).reuse_cached_search_results_millis
set to 0, my instance already stopped responding after less than 100 simultaneous requests.Expected behavior I'd expect the server to respond slower, but never stop processing the incoming requests.
Environment:
Additional context Re-enabling the cache to the default of 60 seconds is no option for us, because that causes newly created resources to not be found in searches within 60 seconds after creation, which is a necessity for us.