quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.82k stars 2.69k forks source link

Shutdown is hanging #38164

Closed shawkins closed 8 months ago

shawkins commented 10 months ago

Describe the bug

See https://issues.redhat.com/browse/JBTM-3830 and https://github.com/keycloak/keycloak/issues/23786 - for some of tests where the server has a failed start up, the process will fail to exit due to the TransactionReaper shutdown not completing.

Expected behavior

The keycloak server should stop.

Actual behavior

The keycloak server shutdown hangs.

How to Reproduce?

Add a RepeatedTest annotation to https://github.com/keycloak/keycloak/blob/5d3df39f04d64bcfe3acc077de837a57bd838b64/quarkus/tests/integration/src/test/java/org/keycloak/it/cli/dist/FipsDistTest.java#L124 - this only occurs about 10% of the time. In the output when it is hanging the test will capture the thread dump.

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

3.2.9

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

quarkus-bot[bot] commented 10 months ago

/cc @Karm (securepipeline), @jerboaa (securepipeline)

shawkins commented 10 months ago

Unfortunately a simple change to address a possible race condition in narayana did not address this. I was hoping some more eyes could help determine if the remaining element can be forcibly terminated in this case.

mmusgrov commented 10 months ago

@shawkins Can you add some thread dumps so that we can see which threads are waiting?

shawkins commented 10 months ago

@mmusgrov other than the main thread being held at shutdown as shown in https://issues.redhat.com/browse/JBTM-3830 there are no other threads doing work - see the attached for the full thread dump. threaddump.txt

Since no real work has been done at this point by Keycloak, the only thing that should have initiated a transaction is the liquibase schema migrations. Somehow the failed overall startup must leave that as being seen as on-going.

shawkins commented 9 months ago

To follow up the underlying issue is with TransactionReaper.terminate(false) - instead of immediately shutting down, the original timeout of any lingering transactions is still being honored. Thus the shutdown will hang until the transaction(s) times out.

The lingering transaction is not from the liquibase migration as mentioned above, but is from a transaction that is managed by Keycloak's TransactionalSessionHandler - https://github.com/keycloak/keycloak/blob/ed96b13312198569b7e68647eefe44d5ee24c739/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/transaction/TransactionalSessionHandler.java#L27

In the affected test cases the server does not complete a normal startup, and a session that was started does not have the corresponding close method called prior to TransactionReaper.terminate.

So a question on the quarkus/jaxrs side - should there be a guarentee that our CloseSessionHandler is called? https://github.com/keycloak/keycloak/blob/ed96b13312198569b7e68647eefe44d5ee24c739/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/integration/jaxrs/CloseSessionHandler.java#L68

cc @geoand

geoand commented 9 months ago

Not sure I undertstand, where were you hoping closeSession to be called from?

shawkins commented 9 months ago

@geoand both our create and close are expected to be called via the handler chain on QuarkusResteasyReactiveRequestContext.

geoand commented 9 months ago

I admit I have not understood the details of the issue, but I don't see why you expect that would be called when no HTTP requests have been handled and therefore there is no REST layer code involved.

shawkins commented 9 months ago

I admit I have not understood the details of the issue, but I don't see why you expect that would be called when no HTTP requests have been handled and therefore there is no REST layer code involved.

see the correction in https://github.com/quarkusio/quarkus/issues/38164#issuecomment-1915256601 - there is REST session being created which is responsible for the left-over transaction.

geoand commented 9 months ago

there is REST session

Is this a Keycloak thing?

shawkins commented 9 months ago

Is this a Keycloak thing?

Keycloak is defining a CreateSessionHandler [1] and a CloseSessionHandler [2] which get installed as handlers / filters in the QuarkusResteasyReactiveRequestContext [3].

When transactions are cleaned up appropriately I can see paired calls to the create and close methods via their respective handlers - for example:

        at org.keycloak.quarkus.runtime.transaction.TransactionalSessionHandler.close(TransactionalSessionHandler.java:57)
        at org.keycloak.quarkus.runtime.integration.jaxrs.CloseSessionHandler.closeSession(CloseSessionHandler.java:69)
        at org.keycloak.quarkus.runtime.integration.jaxrs.CloseSessionHandler.filter(CloseSessionHandler.java:65)
        at org.jboss.resteasy.reactive.server.handlers.ResourceResponseFilterHandler.handle(ResourceResponseFilterHandler.java:25)
        at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
        at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:147)

The KeycloakSession, which is put into the routing context by the CreateSessionHandler, is implemented by a DefaultKeycloakSession [4] and provides access to the underlying transaction manager.

[1] https://github.com/keycloak/keycloak/blob/5373f3c97a801c000fb0ac6b54b2148b1e6c276b/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/integration/resteasy/CreateSessionHandler.java [2] https://github.com/keycloak/keycloak/blob/5373f3c97a801c000fb0ac6b54b2148b1e6c276b/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/integration/jaxrs/CloseSessionHandler.java [3] https://github.com/quarkusio/quarkus/blob/2b4a129ce2ca85f43fa0b047541bd8da00d0d382/extensions/resteasy-reactive/quarkus-resteasy-reactive/runtime/src/main/java/io/quarkus/resteasy/reactive/server/runtime/QuarkusResteasyReactiveRequestContext.java [4] https://github.com/keycloak/keycloak/blob/5373f3c97a801c000fb0ac6b54b2148b1e6c276b/services/src/main/java/org/keycloak/services/DefaultKeycloakSession.java

geoand commented 9 months ago

Just to clarify things a little:

Handlers and Filters are different things.

The former are a RESTEasy Reactive concept that essentially comprise the basic unit of an HTTP request handling pipeline - extensions can contribute to these as Keycloak does. The latter are a JAX-RS concept that are meant to be called before and after a Resource method has been invoked - these are usually contributed by applications, but nothing stops extensions for doing so as well.

With that it mind, it seems like CloseSessionHandler is incorrectly named, but that's just a minor thing.

If I understand correctly, you start a transaction when an HTTP request is being processed and you want to close it when the processing is complete, correct? What I still don't understand is the condition under which you are not seeing the transaction being closed.

shawkins commented 9 months ago

If I understand correctly, you start a transaction when an HTTP request is being processed and you want to close it when the processing is complete, correct?

Yes that is the expectation.

What I still don't understand is the condition under which you are not seeing the transaction being closed.

That's not clear to me either yet. There are some other stacks that end up in the logs, such as #26604, but even with resolving that another stack pops up - and the orphaned transaction problem persists. I was hoping there was something simple / guarenteed here about how close runs.

shawkins commented 9 months ago

After clearing up the #26604 here's what the log looks like:

2024-01-30 13:23:41,191 INFO  [org.keycloak.broker.provider.AbstractIdentityProviderMapper] (main) Registering class org.keycloak.broker.provider.mappersync.ConfigSyncEventListener
2024-01-30 13:23:43,845 INFO  [org.keycloak.quarkus.runtime.storage.legacy.liquibase.QuarkusJpaUpdaterProvider] (main) Initializing database schema. Using changelog META-INF/jpa-changelog-master.xml

UPDATE SUMMARY
Run:                        117
Previously run:               0
Filtered out:                 0
-------------------------------
Total change sets:          117

2024-01-30 13:23:47,987 INFO  [org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory] (main) Node name: localhost-49123, Site name: null
2024-01-30 13:23:48,099 INFO  [org.keycloak.services] (main) KC-SERVICES0050: Initializing master realm
2024-01-30 13:23:49,023 INFO  [org.infinispan.CLUSTER] (main) ISPN000080: Disconnecting JGroups channel `ISPN`

Then a log of transaction begin via TransactionSessionHandler:

    at org.keycloak.quarkus.runtime.transaction.TransactionalSessionHandler.create(TransactionalSessionHandler.java:39)
    at org.keycloak.quarkus.runtime.integration.resteasy.CreateSessionHandler.handle(CreateSessionHandler.java:42)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)

Then a bunch of repeating errors, but no call to CloseSessionHandler.close:

2024-01-30 13:23:49,181 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (executor-thread-1) Uncaught server error: org.infinispan.commons.IllegalLifecycleStateException: ISPN000323: realms is in 'TERMINATED' state and so it does not accept new invocations. Either restart it or recreate the cache container.
    at org.infinispan.cache.impl.SimpleCacheImpl.getDataContainer(SimpleCacheImpl.java:1307)
    at org.infinispan.cache.impl.SimpleCacheImpl.get(SimpleCacheImpl.java:559)
    at org.infinispan.cache.impl.StatsCollectingCache.get(StatsCollectingCache.java:54)
    at org.infinispan.cache.impl.EncoderCache.get(EncoderCache.java:762)
    at org.keycloak.models.cache.infinispan.CacheManager.get(CacheManager.java:95)
    at org.keycloak.models.cache.infinispan.RealmCacheSession.getRealmByName(RealmCacheSession.java:466)
    at org.keycloak.services.managers.RealmManager.getRealmByName(RealmManager.java:100)
    at org.keycloak.services.resources.RealmsResource.resolveRealmAndUpdateSession(RealmsResource.java:180)
    at org.keycloak.services.resources.RealmsResource.getRealmResource(RealmsResource.java:196)
    at org.keycloak.services.resources.RealmsResource$quarkusrestinvoker$getRealmResource_40080643164dd4bff8adf35fd68260654e9c3eab.invoke(Unknown Source)
    at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:49,192 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (executor-thread-1) Failed to create error page: org.infinispan.commons.IllegalLifecycleStateException: ISPN000323: realms is in 'TERMINATED' state and so it does not accept new invocations. Either restart it or recreate the cache container.
    at org.infinispan.cache.impl.SimpleCacheImpl.getDataContainer(SimpleCacheImpl.java:1307)
    at org.infinispan.cache.impl.SimpleCacheImpl.get(SimpleCacheImpl.java:559)
    at org.infinispan.cache.impl.StatsCollectingCache.get(StatsCollectingCache.java:54)
    at org.infinispan.cache.impl.EncoderCache.get(EncoderCache.java:762)
    at org.keycloak.models.cache.infinispan.CacheManager.get(CacheManager.java:95)
    at org.keycloak.models.cache.infinispan.RealmCacheSession.getRealmByName(RealmCacheSession.java:466)
    at org.keycloak.services.managers.RealmManager.getRealmByName(RealmManager.java:100)
    at org.keycloak.services.error.KeycloakErrorHandler.resolveRealm(KeycloakErrorHandler.java:162)
    at org.keycloak.services.error.KeycloakErrorHandler.getResponse(KeycloakErrorHandler.java:97)
    at org.keycloak.services.error.KeycloakErrorHandler.toResponse(KeycloakErrorHandler.java:67)
    at org.jboss.resteasy.reactive.server.core.RuntimeExceptionMapper.mapException(RuntimeExceptionMapper.java:99)
    at org.jboss.resteasy.reactive.server.core.ResteasyReactiveRequestContext.mapExceptionIfPresent(ResteasyReactiveRequestContext.java:337)
    at org.jboss.resteasy.reactive.server.handlers.ExceptionHandler.handle(ExceptionHandler.java:15)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:49,193 ERROR [org.keycloak.headers.DefaultSecurityHeadersProvider] (executor-thread-1) MediaType not set on path /realms/master/, with response status 500
2024-01-30 13:23:49,197 ERROR [io.quarkus.vertx.http.runtime.QuarkusErrorHandler] (executor-thread-1) HTTP Request to //realms/master/ failed, error id: b0f9c8d9-c005-4338-8ee8-de1068b9b1fd-1: jakarta.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
    at org.keycloak.headers.DefaultSecurityHeadersProvider.addHeaders(DefaultSecurityHeadersProvider.java:75)
    at org.keycloak.services.filters.KeycloakSecurityHeadersFilter.filter(KeycloakSecurityHeadersFilter.java:43)
    at org.jboss.resteasy.reactive.server.handlers.ResourceResponseFilterHandler.handle(ResourceResponseFilterHandler.java:25)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:50,206 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (executor-thread-1) Uncaught server error: java.lang.NullPointerException: Cannot invoke "io.quarkus.arc.ArcContainer.getContexts(java.lang.Class)" because the return value of "io.quarkus.arc.Arc.container()" is null
    at io.quarkus.arc.impl.ClientProxies.getDelegate(ClientProxies.java:27)
    at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.arc$delegate(Unknown Source)
    at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.setOtherHttpContextObject(Unknown Source)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusCurrentRequest.set(QuarkusCurrentRequest.java:34)
    at org.jboss.resteasy.reactive.server.core.CurrentRequestManager.set(CurrentRequestManager.java:12)
    at org.jboss.resteasy.reactive.server.core.ResteasyReactiveRequestContext.handleRequestScopeActivation(ResteasyReactiveRequestContext.java:640)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.handleRequestScopeActivation(QuarkusResteasyReactiveRequestContext.java:39)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.requireCDIRequestScope(AbstractResteasyReactiveContext.java:257)
    at org.jboss.resteasy.reactive.server.handlers.ParameterHandler.handle(ParameterHandler.java:43)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:137)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:50,207 ERROR [io.quarkus.vertx.http.runtime.QuarkusErrorHandler] (executor-thread-1) HTTP Request to //realms/master/ failed, error id: b0f9c8d9-c005-4338-8ee8-de1068b9b1fd-2: java.lang.NullPointerException: Cannot invoke "io.quarkus.arc.ArcContainer.getContexts(java.lang.Class)" because the return value of "io.quarkus.arc.Arc.container()" is null
    at io.quarkus.arc.impl.ClientProxies.getDelegate(ClientProxies.java:27)
    at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.arc$delegate(Unknown Source)
    at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.getOtherHttpContextObject(Unknown Source)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusCurrentRequest.get(QuarkusCurrentRequest.java:21)
    at org.jboss.resteasy.reactive.server.core.CurrentRequestManager.get(CurrentRequestManager.java:8)
    at org.keycloak.quarkus.runtime.integration.resteasy.QuarkusKeycloakContext.getResteasyReactiveRequestContext(QuarkusKeycloakContext.java:68)
    at org.keycloak.quarkus.runtime.integration.resteasy.QuarkusKeycloakContext.createHttpRequest(QuarkusKeycloakContext.java:40)
    at org.keycloak.services.DefaultKeycloakContext.getHttpRequest(DefaultKeycloakContext.java:151)
    at org.keycloak.services.DefaultKeycloakContext.getRequestHeaders(DefaultKeycloakContext.java:96)
    at org.keycloak.services.error.KeycloakErrorHandler.getResponse(KeycloakErrorHandler.java:82)
    at org.keycloak.services.error.KeycloakErrorHandler$1.run(KeycloakErrorHandler.java:62)
    at org.keycloak.services.error.KeycloakErrorHandler$1.run(KeycloakErrorHandler.java:59)
    at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:382)
    at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:365)
    at org.keycloak.services.error.KeycloakErrorHandler.toResponse(KeycloakErrorHandler.java:58)
    at org.jboss.resteasy.reactive.server.core.RuntimeExceptionMapper.mapException(RuntimeExceptionMapper.java:99)
    at org.jboss.resteasy.reactive.server.core.ResteasyReactiveRequestContext.mapExceptionIfPresent(ResteasyReactiveRequestContext.java:337)
    at org.jboss.resteasy.reactive.server.handlers.ExceptionHandler.handle(ExceptionHandler.java:15)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)
...

The end result is that this transaction will cause the transaction reaper to be stuck at shutdown holding the main thread (until https://issues.redhat.com/browse/JBTM-3830 is addressed):

"main" #1 prio=5 os_prio=0 cpu=12170.97ms elapsed=114.04s tid=0x00007f4924012f80 nid=0x2603f3 in Object.wait()  [0x00007f492a095000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@17.0.1/Native Method)
    - waiting on <no object reference available>
    at java.lang.Object.wait(java.base@17.0.1/Object.java:338)
    at com.arjuna.ats.arjuna.coordinator.TransactionReaper.shutdown(TransactionReaper.java:760)
    - locked <0x00000000e2a1fb80> (a com.arjuna.ats.arjuna.coordinator.TransactionReaper)
    at com.arjuna.ats.arjuna.coordinator.TransactionReaper.terminate(TransactionReaper.java:969)
    - locked <0x00000000e2a23b90> (a java.lang.Class for com.arjuna.ats.arjuna.coordinator.TransactionReaper)
    at io.quarkus.narayana.jta.runtime.NarayanaJtaRecorder.lambda$handleShutdown$3(NarayanaJtaRecorder.java:171)
    at io.quarkus.narayana.jta.runtime.NarayanaJtaRecorder$$Lambda$1062/0x000000080152c488.run(Unknown Source)
    at io.quarkus.runtime.StartupContext.runAllInReverseOrder(StartupContext.java:84)
    at io.quarkus.runtime.StartupContext.close(StartupContext.java:75)
    at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
    at io.quarkus.runtime.Application.start(Application.java:101)
    at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
    at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
    at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:117)
    at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:33)
    at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
    at picocli.CommandLine.access$1500(CommandLine.java:148)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
    at picocli.CommandLine.execute(CommandLine.java:2170)
    at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:124)
    at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:107)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.1/Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.1/NativeMethodAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.1/DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(java.base@17.0.1/Method.java:568)
    at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
    at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
geoand commented 9 months ago

Is there any simple way for me to reproduce this?

shawkins commented 9 months ago

Other than repeatively running our test case, no I don't have a simple reproducer.

For reference the exception that is expected to cause premature shutdown / abnormal startup is:

java.lang.RuntimeException: Unable to start HTTP server
    at io.quarkus.vertx.http.runtime.VertxHttpRecorder.doServerStart(VertxHttpRecorder.java:849)
    at io.quarkus.vertx.http.runtime.VertxHttpRecorder.startServer(VertxHttpRecorder.java:336)
    at io.quarkus.deployment.steps.VertxHttpProcessor$openSocket189362710.deploy_0(Unknown Source)
    at io.quarkus.deployment.steps.VertxHttpProcessor$openSocket189362710.deploy(Unknown Source)
    at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
    at io.quarkus.runtime.Application.start(Application.java:101)
    at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
    at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
    at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:117)
    at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:33)
    at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
    at picocli.CommandLine.access$1500(CommandLine.java:148)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
    at picocli.CommandLine.execute(CommandLine.java:2170)
    at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:124)
    at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:107)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
    at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: malformed sequence
    at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at io.quarkus.vertx.http.runtime.VertxHttpRecorder.doServerStart(VertxHttpRecorder.java:770)
    ... 25 more
Caused by: java.lang.IllegalArgumentException: malformed sequence
    at org.bouncycastle.asn1.bc.ObjectStore.<init>(Unknown Source)
    at org.bouncycastle.asn1.bc.ObjectStore.getInstance(Unknown Source)
    at org.bouncycastle.jcajce.provider.ProvBCFKS$BCFIPSKeyStoreSpi.engineLoad(Unknown Source)
    at java.base/java.security.KeyStore.load(KeyStore.java:1473)
    at io.vertx.core.net.impl.KeyStoreHelper.loadKeyStore(KeyStoreHelper.java:266)
    at io.vertx.core.net.KeyStoreOptionsBase.getHelper(KeyStoreOptionsBase.java:188)
    at io.vertx.core.net.KeyStoreOptionsBase.getKeyManagerFactory(KeyStoreOptionsBase.java:206)
    at io.vertx.core.net.impl.SSLHelper.lambda$build$6(SSLHelper.java:246)
    at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:180)
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:277)
    at io.vertx.core.impl.ContextBase.lambda$internalExecuteBlocking$2(ContextBase.java:199)
    at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
    at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1512)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)
geoand commented 9 months ago

But in that case I still don't see how an HTTP request is being handled as the HTTP server has not come up

shawkins commented 9 months ago

@geoand right that is confusing. Trapping this a bit more in the debugger, I can see resteasy reactive logic adding the QuarkusResteasyReactiveRequestContext as a task - at the same time the malformed sequence is being thrown from starting the HTTP server. Then when that task runs, it triggers the creation of the transaction. Stack for adding the task:

        at org.jboss.threads.EnhancedQueueExecutor.execute(EnhancedQueueExecutor.java:799)
        at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:219)
        at org.jboss.resteasy.reactive.server.handlers.RestInitialHandler.beginProcessing(RestInitialHandler.java:48)
        at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:23)
        at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:10)
        at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1286)
        at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:177)
        at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:144)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$1.handle(HttpServerCommonHandlers.java:58)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$1.handle(HttpServerCommonHandlers.java:36)
        at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1286)
        at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:177)
        at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:144)
        at io.quarkus.resteasy.reactive.server.runtime.ResteasyReactiveRecorder$13.handle(ResteasyReactiveRecorder.java:382)
        at io.quarkus.resteasy.reactive.server.runtime.ResteasyReactiveRecorder$13.handle(ResteasyReactiveRecorder.java:375)
        at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1286)
        at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:177)
        at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:144)
        at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:68)
        at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:37)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$2.handle(HttpServerCommonHandlers.java:82)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$2.handle(HttpServerCommonHandlers.java:65)
        at io.quarkus.vertx.http.runtime.VertxHttpRecorder$1.handle(VertxHttpRecorder.java:177)
        at io.quarkus.vertx.http.runtime.VertxHttpRecorder$1.handle(VertxHttpRecorder.java:153)
...
geoand commented 9 months ago

It seems to me like some HTTP request is trying to be served before everything is up and running

mmusgrov commented 8 months ago

The recent merged narayana component upgrade, pr#39009, included a small change to how we handle shutdown. @shawkins would it be possible for you to retest to see if there are any other issues related to narayana?

shawkins commented 8 months ago

would it be possible for you to retest to see if there are any other issues related to narayana?

Yes, I'll try the scenario again.

I think the cause of the orphaned transaction was identified as well - https://github.com/keycloak/keycloak/pull/27682

shawkins commented 8 months ago

No reoccurances in our test case after 50 runs. Instead I see logs like:

2024-03-11 21:00:02,410 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffffc0a801f3:9ced:65efa90f:1a aborting with 1 threads active!

Which would not have occurred when exhibiting the hanging behavior.

mmusgrov commented 8 months ago

So @geoand and @shawkins can we close this one as fixed by the Narayana component upgrade?

geoand commented 8 months ago

👌