quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.53k stars 2.61k forks source link

Hibernate Reactive hangs on insert #16255

Closed stuartwdouglas closed 3 years ago

stuartwdouglas commented 3 years ago

Describe the bug

Hibernate reactive hangs when inserting, the sequence generator stops returning numbers and requests just hang.

To Reproduce

Take the hibernate reactive quickstart, and remove the unique constraint for the name field (it will still reproduce even with this, this just simplifies things).

Then run wrk -s post.lua http://localhost:8080/fruits

where post.lua is:

wrk.method = "POST"
wrk.body   = "{\"name\": \"fruit\"}"
wrk.headers["Content-Type"] = "application/json"

Everything will hang very quickly, locally I set breapoints in BlockingIdentifierGenerator and I see the following output:

Breakpoint reached at org.hibernate.reactive.id.impl.BlockingIdentifierGenerator.generate(BlockingIdentifierGenerator.java:69)
Breakpoint reached at org.hibernate.reactive.id.impl.BlockingIdentifierGenerator.lambda$generate$1(BlockingIdentifierGenerator.java:76)
Breakpoint reached at org.hibernate.reactive.id.impl.BlockingIdentifierGenerator.generate(BlockingIdentifierGenerator.java:69)
Breakpoint reached at org.hibernate.reactive.id.impl.BlockingIdentifierGenerator.lambda$generate$1(BlockingIdentifierGenerator.java:76)
Breakpoint reached at org.hibernate.reactive.id.impl.BlockingIdentifierGenerator.generate(BlockingIdentifierGenerator.java:69)

Basically the 3rd time generate() is called and attempts to allocate more numbers the call just never returns.

quarkus-bot[bot] commented 3 years ago

/cc @DavideD, @Sanne, @gavinking

stuartwdouglas commented 3 years ago

org.hibernate.reactive.pool.impl.ProxyConnection#withConnection hits the line throw new IllegalStateException( "session is currently connecting to database" );

Because this exception is thrown rather than returning a ComplectionStage that has failed with IllegalStateException everything locks up permanently as the generator CS never completes.

DavideD commented 3 years ago

I will have a look

DavideD commented 3 years ago

I cannot replicate the issue with the latest quickstart in development hibernate-reactive-quickstart (we updated it less than an hour ago).

Could you try again and let me know the -R option you are using with wrk, please?

Sanne commented 3 years ago

I'll let Davide look, but a suggestion: the error message smells of threading issues again.

stuartwdouglas commented 3 years ago

So I had another look and I am not sure if it is that IllegalStateException I mentioned, I saw it hit that when I stepped through in a debugger but when I just set a breakpoint there it does not seem to be hit. It still locks up the the queue just keeps growing indefinitely though.

stuartwdouglas commented 3 years ago

Could this be the cause: https://github.com/hibernate/hibernate-reactive/blob/eceb57100021ea3b72f8ee78580b993349358b5f/hibernate-reactive-core/src/main/java/org/hibernate/reactive/id/impl/BlockingIdentifierGenerator.java#L82

While identifiers are being fetched connections from every event loop thread are added to the list. When the CS completes then they all call generate again from the event loop thread that fetched the identifier, which may be different to the actual event loop thread of the connection. I am pretty sure having multiple threads involved will not end well, although I am not sure if the exact failure mode (assuming this is even correct).

Also the current design does not seem great when you are defaulting to an increment of 1.

wernert75 commented 3 years ago

I had the same problem. The moment more than one query ends in a database insert, the query and the http-request hangs. Is there any solution? In my opinion, it´s a serious problem.

gavinking commented 3 years ago

I am pretty sure having multiple threads involved will not end well, although I am not sure if the exact failure mode (assuming this is even correct).

So the question is why, precisely, is this wrong?

Perhaps we do the work in an incorrect tx context?

gavinking commented 3 years ago

Wait, @stuartwdouglas are you saying that somewhere we are maintaining an association of threads to vert.x connections? Is this to do with the workarounds @Sanne had to do for the issues with the vert.x connection pool (prior to 4.1)?

gavinking commented 3 years ago

(I don't now remember precisely what it was we decided we had to do as a (temporary) solution for that issue.)

stuartwdouglas commented 3 years ago

Every Vert.x connection has an associated thread, which may be different to the other connections threads. In this code when the future is complete it dispatches on the current thread, which will always be a vert.x thread but not necessarily the vert.x thread for the connection that was waiting.

Usually this causes problems in the form of IO issues, where the connections actual thread receives an IO event at the same time this other thread is still working and you now have two threads working on stuff that is designed to be single threaded.

gavinking commented 3 years ago

Right. So this is the thing that's supposed to be fixed with the new connection pool in vert.x 4.1. (I had thought it was coming in 4.0 but Sanne corrected me when we chatted last week.)

So I'm actually struggling to see how we can even in principle implement the behavior we need on 4.0.

stuartwdouglas commented 3 years ago

I guess either dispatch back to the original thread, or have one ID generator per thread, so each thread allocates from its own ID range (which will perform better, but means ID's won't be sequential).

gavinking commented 3 years ago

Well I personally have no idea how to despatch to a specific thread. I guess maybe there is some API for that somewhere.

As for your second suggestion: that was how I wrote it originally (twice). But the thing is it means that in highly concurrent systems you waste a lot of hi values, and everyone hates that. So I changed it.

gavinking commented 3 years ago

Excuse me, that's wrong, I didn't have a generator per thread, I just let the retrieval of hi values happen concurrently. Your suggestion is a little better, perhaps, but kinda hard to implement in HR itself, which doesn't really know anything about threads.

Sanne commented 3 years ago

My suggestion is that we can't have users of HR which aren't already running on the vertx thread, so making "the current thread" and the "io thread" the same.

I've said this before but others said that such restriction is too strong, but I believe we have to as such issues are otherwise going to manifest in many unpredictable ways.

gavinking commented 3 years ago

But isn't this just a temporary issue until vert.x 4.1?

stuartwdouglas commented 3 years ago

'an IO thread' is a kinda useless restriction, it needs to be 'the IO thread' that belongs to the current connection. It also needs to be the same as the HTTP and messaging IO threads. AFAIK I think Vert.x will mostly handle this, so it's only situations like this where it is an issue.

In the HTTP stuff I do with Vert.x you can cast to the underlying connection impl to get the event loop, and I thought Vert.x 4 had a more generic Context concept that would help with this, although it is late and I won't be able to look into the details until tomorrow.

stuartwdouglas commented 3 years ago

Also it is a pretty serious issue, as even a tiny bit of load break HR and then inserts just stop working, and I am not 100% sure that my analysis is correct because I did not spend a lot of time on it. It is definitely hanging in the ID generator under even tiny amounts of load, but I have not verified that my thread theory is correct, it could be something else.

Sanne commented 3 years ago

But isn't this just a temporary issue until vert.x 4.1?

I suspect not, but we can see how it behaves when we have 4.1. @DavideD is going to work on more extensive integration tests, this needs to be bullet proof.

gavinking commented 3 years ago

I might be being dense, but i'm still not convinced that this issue is about threads. I don't think a CompletionStage should really care which thread it's executed on. I think it's an issue with connection affinity. It seems to me that it's the assumption that threads map to connections that is broken. Am I wrong?

The thing that is especially confusing me is that in principle the reactive session is keeping a hard reference to the connection it's using. We're not relying on the thread to obtain our connection. So that's why I keep asking: isn't this a problem with the Vert.x pre-4.1 approach to connection pooling?

I mean who, precisely, is making the assumption that thread=connection here? Quarkus? HR? vert.x? I think it's vert.x to blame, not HR.

But i'm not sure.

stuartwdouglas commented 3 years ago

I just hit this in the debugger again and I don't think this is about threads as such, although if the model is that you can only operate on the IO thread for a given connection I do think this has the potential to cause problems.

I see the following stack trace being the problem when it hits that IllegalStateException:

withConnection:52, ProxyConnection (org.hibernate.reactive.pool.impl)
selectIdentifier:114, ProxyConnection (org.hibernate.reactive.pool.impl)
nextHiValue:45, SequenceReactiveIdentifierGenerator (org.hibernate.reactive.id.impl)
generate:71, BlockingIdentifierGenerator (org.hibernate.reactive.id.impl)
lambda$null$0:82, BlockingIdentifierGenerator (org.hibernate.reactive.id.impl)
accept:-1, 1166568959 (org.hibernate.reactive.id.impl.BlockingIdentifierGenerator$$Lambda$1316)
forEach:1541, ArrayList (java.util)
lambda$generate$1:82, BlockingIdentifierGenerator (org.hibernate.reactive.id.impl)
accept:-1, 898959048 (org.hibernate.reactive.id.impl.BlockingIdentifierGenerator$$Lambda$1155)
tryFire:714, CompletableFuture$UniAccept (java.util.concurrent)
postComplete:506, CompletableFuture (java.util.concurrent)
complete:2073, CompletableFuture (java.util.concurrent)
lambda$toCompletionStage$0:26, Handlers (org.hibernate.reactive.pool.impl)
handle:-1, 561032211 (org.hibernate.reactive.pool.impl.Handlers$$Lambda$802)
onSuccess:124, FutureImpl$3 (io.vertx.core.impl.future)
lambda$emitSuccess$0:54, FutureBase (io.vertx.core.impl.future)
run:-1, 93014226 (io.vertx.core.impl.future.FutureBase$$Lambda$970)
execute:83, EventLoopContext (io.vertx.core.impl)
execute:199, DuplicatedContext (io.vertx.core.impl)
emitSuccess:51, FutureBase (io.vertx.core.impl.future)
tryComplete:179, FutureImpl (io.vertx.core.impl.future)
tryComplete:23, PromiseImpl (io.vertx.core.impl.future)
tryComplete:102, QueryResultBuilder (io.vertx.sqlclient.impl)
tryComplete:35, QueryResultBuilder (io.vertx.sqlclient.impl)
complete:66, Promise (io.vertx.core)
handle:51, Promise (io.vertx.core)
handle:29, Promise (io.vertx.core)
fire:46, CommandResponse (io.vertx.sqlclient.impl.command)
handleMessage:258, SocketConnectionBase (io.vertx.sqlclient.impl)
handleMessage:94, PgSocketConnection (io.vertx.pgclient.impl)
lambda$init$0:96, SocketConnectionBase (io.vertx.sqlclient.impl)
handle:-1, 439263827 (io.vertx.sqlclient.impl.SocketConnectionBase$$Lambda$836)
lambda$new$1:97, NetSocketImpl (io.vertx.core.net.impl)
handle:-1, 211644816 (io.vertx.core.net.impl.NetSocketImpl$$Lambda$830)
handleEvent:240, InboundBuffer (io.vertx.core.streams.impl)
write:130, InboundBuffer (io.vertx.core.streams.impl)
lambda$handleMessage$9:390, NetSocketImpl (io.vertx.core.net.impl)
handle:-1, 53810781 (io.vertx.core.net.impl.NetSocketImpl$$Lambda$841)
emit:52, EventLoopContext (io.vertx.core.impl)
emit:294, ContextImpl (io.vertx.core.impl)
emit:24, EventLoopContext (io.vertx.core.impl)
handleMessage:389, NetSocketImpl (io.vertx.core.net.impl)
read:153, ConnectionBase (io.vertx.core.net.impl)
channelRead:154, VertxHandler (io.vertx.core.net.impl)
invokeChannelRead:379, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelRead:365, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:357, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:436, CombinedChannelDuplexHandler$DelegatingChannelHandlerContext (io.netty.channel)
lambda$write$0:88, PgEncoder (io.vertx.pgclient.impl.codec)
handle:-1, 2069447266 (io.vertx.pgclient.impl.codec.PgEncoder$$Lambda$839)
handleReadyForQuery:139, PgCommandCodec (io.vertx.pgclient.impl.codec)
decodeReadyForQuery:235, PgDecoder (io.vertx.pgclient.impl.codec)
channelRead:95, PgDecoder (io.vertx.pgclient.impl.codec)
channelRead:251, CombinedChannelDuplexHandler (io.netty.channel)
invokeChannelRead:379, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelRead:365, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:357, AbstractChannelHandlerContext (io.netty.channel)
channelRead:1410, DefaultChannelPipeline$HeadContext (io.netty.channel)
invokeChannelRead:379, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelRead:365, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:919, DefaultChannelPipeline (io.netty.channel)
read:166, AbstractNioByteChannel$NioByteUnsafe (io.netty.channel.nio)
processSelectedKey:719, NioEventLoop (io.netty.channel.nio)
processSelectedKeysOptimized:655, NioEventLoop (io.netty.channel.nio)
processSelectedKeys:581, NioEventLoop (io.netty.channel.nio)
run:493, NioEventLoop (io.netty.channel.nio)
run:989, SingleThreadEventExecutor$4 (io.netty.util.concurrent)
run:74, ThreadExecutorMap$2 (io.netty.util.internal)
run:30, FastThreadLocalRunnable (io.netty.util.concurrent)
run:834, Thread (java.lang)
stuartwdouglas commented 3 years ago

So it his this because the connection is closed, and the closed connection is still awaiting an ID. Because the calling code does not guard again this throwing (rather than returning a failed CompletionStage) the whole system locks up.

stuartwdouglas commented 3 years ago

The EM is closed because the RESTEasy Reactive request is considered completed, before the ID has been generated, which seems very odd.

stuartwdouglas commented 3 years ago

Ok, the bug is actually ridiculously simple. When the queue is processed 'session' refers to the session that just generated the ID, not the session that is asking for the ID. Because we call result.complete() earlier if this finishes off the request it can close the EM, so you are calling generate with a closed session.

gavinking commented 3 years ago

Haha. OMG. Oooops.

I really did stare at that code for a long time without noticing that. How embarrassing!

gavinking commented 3 years ago

I have opened https://github.com/hibernate/hibernate-reactive/issues/707.

DavideD commented 3 years ago

I really did stare at that code for a long time without noticing that. How embarrassing!

If it can be of any consolation, you are not the only one who looked at that code.

kenfinnigan commented 3 years ago

Not sure if it's the same issue or not, but I had a different situation with the same error.

If I have the following:

Uni<Customer> customerUni = Customer.findById(id);
Uni<List<Order>> customerOrdersUni = orderService.getOrdersForCustomer(id);
return Uni.combine()
    .all().unis(customerUni, customerOrdersUni)
    .combinedWith((customer, orders) -> {
      if (customer != null) {
        customer.orders = orders;
      }
      return customer;
    })
    .onItem().ifNotNull().transform(customer -> Response.ok(customer).build())
    .onItem().ifNull().continueWith(Response.ok().status(Response.Status.NOT_FOUND)::build);

where Customer is a Panache entity and getOrdersForCustomer() uses an Order Panache entity to retrieve orders with the customer id, I see the following stacktrace:

java.lang.IllegalStateException: session is currently connecting to database
    at org.hibernate.reactive.pool.impl.ProxyConnection.withConnection(ProxyConnection.java:52)
    at org.hibernate.reactive.pool.impl.ProxyConnection.selectJdbc(ProxyConnection.java:109)
    at org.hibernate.reactive.loader.ReactiveLoader.executeReactiveQueryStatement(ReactiveLoader.java:129)
    at org.hibernate.reactive.loader.ReactiveLoader.doReactiveQueryAndInitializeNonLazyCollections(ReactiveLoader.java:69)
    at org.hibernate.reactive.loader.CachingReactiveLoader.doReactiveList(CachingReactiveLoader.java:62)
    at org.hibernate.reactive.loader.CachingReactiveLoader.reactiveListIgnoreQueryCache(CachingReactiveLoader.java:80)
    at org.hibernate.reactive.loader.hql.impl.ReactiveQueryLoader.reactiveList(ReactiveQueryLoader.java:129)
    at org.hibernate.reactive.loader.hql.impl.ReactiveQueryLoader.reactiveList(ReactiveQueryLoader.java:95)
    at org.hibernate.reactive.session.impl.ReactiveQueryTranslatorImpl.reactiveList(ReactiveQueryTranslatorImpl.java:139)
    at org.hibernate.reactive.session.impl.ReactiveHQLQueryPlan.performReactiveList(ReactiveHQLQueryPlan.java:114)
    at org.hibernate.reactive.session.impl.ReactiveSessionImpl.lambda$reactiveList$8(ReactiveSessionImpl.java:438)
    at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
    at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
    at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:143)
    at org.hibernate.reactive.session.impl.ReactiveSessionImpl.reactiveList(ReactiveSessionImpl.java:438)
    at org.hibernate.reactive.session.impl.ReactiveQueryImpl.doReactiveList(ReactiveQueryImpl.java:133)
    at org.hibernate.reactive.session.impl.ReactiveQueryImpl.getReactiveResultList(ReactiveQueryImpl.java:109)
    at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
    at io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage.subscribe(UniCreateFromCompletionStage.java:24)
    at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
    at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
    at org.hibernate.reactive.mutiny.impl.MutinySessionFactoryImpl.lambda$new$1(MutinySessionFactoryImpl.java:53)
    at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.subscribe(UniRunSubscribeOn.java:25)
    at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
    at io.smallrye.mutiny.operators.uni.UniAndCombination$UniHandler.subscribe(UniAndCombination.java:203)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
    at io.smallrye.mutiny.operators.uni.UniAndCombination$AndSupervisor.run(UniAndCombination.java:71)
    at io.smallrye.mutiny.operators.uni.UniAndCombination$AndSupervisor.access$000(UniAndCombination.java:53)
    at io.smallrye.mutiny.operators.uni.UniAndCombination.subscribe(UniAndCombination.java:50)
    at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
    at io.smallrye.mutiny.groups.UniSubscribe.withSubscriber(UniSubscribe.java:50)
    at io.smallrye.mutiny.groups.UniSubscribe.with(UniSubscribe.java:90)
    at org.jboss.resteasy.reactive.server.handlers.UniResponseHandler.handle(UniResponseHandler.java:17)
    at org.jboss.resteasy.reactive.server.handlers.UniResponseHandler.handle(UniResponseHandler.java:8)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:122)
    at org.jboss.resteasy.reactive.server.handlers.RestInitialHandler.beginProcessing(RestInitialHandler.java:47)
    at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:17)
    at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:7)
    at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1038)
    at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:137)
    at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:132)
    at io.quarkus.vertx.http.runtime.VertxHttpRecorder$4.handle(VertxHttpRecorder.java:338)
    at io.quarkus.vertx.http.runtime.VertxHttpRecorder$4.handle(VertxHttpRecorder.java:316)
    at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1038)
    at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:137)
    at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:132)
    at io.quarkus.vertx.http.runtime.devmode.VertxHttpHotReplacementSetup$3.handle(VertxHttpHotReplacementSetup.java:86)
    at io.quarkus.vertx.http.runtime.devmode.VertxHttpHotReplacementSetup$3.handle(VertxHttpHotReplacementSetup.java:75)
    at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:327)
    at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
    at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)
stuartwdouglas commented 3 years ago

That is likely the same error

stuartwdouglas commented 3 years ago

Actually your code does not seem to be inserting into the DB, so it is likely a different error.

stuartwdouglas commented 3 years ago

I think you might need to wrap your code in a transaction.

kenfinnigan commented 3 years ago

@stuartwdouglas Right, it's purely reading from two different entities.

I changed the Order retrieval to use PgPool directly. I had thought about doing them in a transaction but wanted to avoid that for a pure read. Are we saying that a situation like this must have a transaction even for reads?

DavideD commented 3 years ago

I think this issue my be related to this one: https://github.com/quarkusio/quarkus/issues/14709#issuecomment-819571648

DavideD commented 3 years ago

Shouldn't we close this issue? The fix was released with 2.0.0.Alpha1: https://github.com/quarkusio/quarkus/pull/16588