jflamy / owlcms4

Olympic Weightlifting Competition Management System
https://owlcms.github.io/owlcms4/#/index
Other
67 stars 22 forks source link

Ungracefull handling of lost db connection #1133

Closed renemadsen closed 2 weeks ago

renemadsen commented 2 weeks ago

Describe the bug

10:53:14.946 ERROR HHH000099: an assertion failure occurred (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session): org.hibernate.AssertionFailure: bug adding collection twice 
10:53:14.947 WARN  java.util.concurrent.ExecutionException: org.hibernate.AssertionFailure: bug adding collection twice  
10:54:04.898 WARN  SQL Error: 0, SQLState: null                                                
10:54:04.899 ERROR HikariPool-1 - Connection is not available, request timed out after 50001ms. 
Caught javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
10:54:04.947 WARN  SQL Error: 0, SQLState: null                                                
10:54:04.947 ERROR HikariPool-1 - Connection is not available, request timed out after 50000ms. 
10:54:04.949 WARN  HHH000471: Cannot unset session in a collection because an unexpected session is defined. A persistent collection may only be associated with one session at a time. Collection : [app.owlcms.data.athlete.Athlete.participations#1814530031830547855] 
10:54:04.949 WARN  java.util.concurrent.ExecutionException: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection  
10:54:04.950 WARN  java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Session/EntityManager is closed  
10:54:04.955 WARN  SQL Error: 0, SQLState: null                                                
10:54:04.955 ERROR HikariPool-1 - Connection is not available, request timed out after 50000ms. 
10:54:19.895 WARN  SQL Error: 0, SQLState: null                                                
10:54:19.895 ERROR HikariPool-1 - Connection is not available, request timed out after 50000ms. 
10:54:19.896 WARN  javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection  
10:54:20.051 WARN  java.lang.UnsupportedOperationException: Unexpected message id from the client. Expected sync id: 1104, got 1105. more details logged on DEBUG level.

Leading up to this fatal error, we have multiple warnings/errors like this:

WARN  Connection leak detection triggered for org.postgresql.jdbc.PgConnection@dda560b on thread pool-15-thread-93, stack trace follows 
java.lang.Exception: Apparent connection leak detected
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
    at org.hibernate.hikaricp.internal.HikariCPConnectionProvider.getConnection(HikariCPConnectionProvider.java:77)
    at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
    at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
    at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
    at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.connection(StatementPreparerImpl.java:50)
    at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$5.doPrepare(StatementPreparerImpl.java:149)
    at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:176)
    at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:151)
    at org.hibernate.loader.plan.exec.internal.AbstractLoadPlanBasedLoader.prepareQueryStatement(AbstractLoadPlanBasedLoader.java:198)
    at org.hibernate.loader.plan.exec.internal.AbstractLoadPlanBasedLoader.executeQueryStatement(AbstractLoadPlanBasedLoader.java:162)
    at org.hibernate.loader.plan.exec.internal.AbstractLoadPlanBasedLoader.executeLoad(AbstractLoadPlanBasedLoader.java:104)
    at org.hibernate.loader.collection.plan.AbstractLoadPlanBasedCollectionInitializer.initialize(AbstractLoadPlanBasedCollectionInitializer.java:87)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:705)
    at org.hibernate.event.internal.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:76)
    at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:107)
    at org.hibernate.internal.SessionImpl.initializeCollection(SessionImpl.java:2208)
    at org.hibernate.collection.internal.AbstractPersistentCollection$4.doWork(AbstractPersistentCollection.java:595)
    at org.hibernate.collection.internal.AbstractPersistentCollection.withTemporarySessionIfNeeded(AbstractPersistentCollection.java:264)
    at org.hibernate.collection.internal.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:591)
    at org.hibernate.collection.internal.AbstractPersistentCollection.read(AbstractPersistentCollection.java:149)
    at org.hibernate.collection.internal.PersistentBag.iterator(PersistentBag.java:387)
    at app.owlcms.data.athlete.Athlete.getMainRankings(Athlete.java:2007)
    at jdk.internal.reflect.GeneratedMethodAccessor83.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at org.hibernate.proxy.pojo.bytebuddy.ByteBuddyInterceptor.intercept(ByteBuddyInterceptor.java:56)
    at org.hibernate.proxy.ProxyConfiguration$InterceptorDispatcher.intercept(ProxyConfiguration.java:95)
    at app.owlcms.data.athlete.Athlete$HibernateProxy$h3yhJICq.getMainRankings(Unknown Source)
    at app.owlcms.displays.scoreboard.Results.getAthleteJson(Results.java:776)
    at app.owlcms.displays.scoreboard.Results.getAthletesJson(Results.java:850)
    at app.owlcms.displays.scoreboard.Results.lambda$computeLeaders$2(Results.java:604)
    at app.owlcms.init.OwlcmsSession.withFop(OwlcmsSession.java:168)
    at app.owlcms.displays.scoreboard.Results.computeLeaders(Results.java:586)
    at app.owlcms.displays.scoreboard.Results.updateDisplay(Results.java:1068)
    at app.owlcms.displays.scoreboard.Results.doUpdate(Results.java:724)
    at app.owlcms.displays.scoreboard.Results.lambda$slaveOrderUpdated$cf0ff060$1(Results.java:545)
    at com.vaadin.flow.component.UI.accessSynchronously(UI.java:496)
    at com.vaadin.flow.component.UI$2.execute(UI.java:571)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at com.vaadin.flow.server.VaadinService.runPendingAccessTasks(VaadinService.java:2092)
    at com.vaadin.flow.server.VaadinSession.unlock(VaadinSession.java:725)
    at com.vaadin.flow.server.VaadinService.ensureAccessQueuePurged(VaadinService.java:2057)
    at com.vaadin.flow.server.VaadinService.accessSession(VaadinService.java:2024)
    at com.vaadin.flow.server.VaadinSession.access(VaadinSession.java:1029)
    at com.vaadin.flow.component.UI.access(UI.java:568)
    at com.vaadin.flow.component.UI.access(UI.java:551)
    at app.owlcms.nui.lifting.UIEventProcessor.uiAccessIgnoreIfSelfOrigin(UIEventProcessor.java:49)
    at app.owlcms.nui.lifting.UIEventProcessor.uiAccess(UIEventProcessor.java:95)
    at app.owlcms.displays.scoreboard.Results.slaveOrderUpdated(Results.java:541)
    at jdk.internal.reflect.GeneratedMethodAccessor682.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:85)
    at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:142)
    at com.google.common.eventbus.Subscriber.lambda$dispatchEvent$0(Subscriber.java:71)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)

owlcms version version 53.0.2

To Reproduce Have the competition run using postgresql and have it run into second group, then this error will come eventually.

Expected behavior Expect the code to recover by it self.

Screenshots N/A

Desktop (please complete the following information): Independet of desktop OS

Additional context Server is Epyc with with more than enough memory. Logs can be provided via mail.

jflamy commented 2 weeks ago

Version 53.1.0 should fix this. Please confirm.

The internal ORM transactions used for lazy loading participations were interfering with each other under even modest concurrency. That is being our control. We now manage (avoid) concurrency on this specific relationship ourselves.

jflamy commented 2 weeks ago

I do not have a database myself where this can be provoked. The error was seen under H2 as well (it's Hibernate that seemed to be getting confused). Whenever possible, provide a database to reproduce with a recipe, and the matching logs.

In the other instance where this was reported I only had the logs to work from.

jflamy commented 2 weeks ago

Regarding the "adding participations in two sessions at the same time" - owlcms systematically opens different transactions with different caches (persistence contexts) to populate the objects. How the same object would be populated twice concurrently in the same persistent context is a priori not possible, i.e. in theory the application code was already doing the right thing to avoid this issue. The newer code goes one step further by removing the need to do the lazy load.

jflamy commented 2 weeks ago

Root cause is #1120