quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.87k stars 2.71k forks source link

Smallrye GraphQL's parallel execution not compatible with Hibernate reactive #32870

Open eamelink opened 1 year ago

eamelink commented 1 year ago

Describe the bug

When creating a GraphQL API with data retrieved with Hibernate reactive, I get the following exception:

2023-04-24 21:14:48,145 ERROR [io.sma.graphql] (vert.x-eventloop-thread-1) SRGQL012000: Data Fetching Error: java.lang.IllegalStateException: Illegal pop() with non-matching JdbcValuesSourceProcessingState

Note that this is very similar to the problem in #32790, but I think here it's caused by the underlying library graphql-java doing data fetching in parallel (@phillip-kruger correct me if I'm wrong).

Expected behavior

I would expect a straightforward GraphQL query to work without having to think too much about Hibernate Reactive Sessions

Actual behavior

An exception is thrown

How to Reproduce?

Here's a reproducer: https://github.com/eamelink/graphql-and-hibernate-reactive

Output of uname -a or ver

Darwin Eriks-MacBook-Pro.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000 arm64

Output of java -version

openjdk version "17.0.6" 2023-01-17 OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10) OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

3.0.0.CR2

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.8.8 (4c87b05d9aedce574290d1acc98575ed5eb6cd39) Maven home: /Users/eamelink/.m2/wrapper/dists/apache-maven-3.8.8-bin/67c30f74/apache-maven-3.8.8 Java version: 17.0.6, vendor: Eclipse Adoptium, runtime: /Users/eamelink/.sdkman/candidates/java/17.0.6-tem Default locale: en_NL, platform encoding: UTF-8 OS name: "mac os x", version: "13.3.1", arch: "aarch64", family: "mac"

Additional information

No response

quarkus-bot[bot] commented 1 year ago

/cc @DavideD (hibernate-reactive), @Ladicek (smallrye), @Sanne (hibernate-reactive), @gavinking (hibernate-reactive), @jmartisk (graphql,smallrye), @phillip-kruger (graphql,smallrye), @radcortez (smallrye)

eamelink commented 1 year ago

But there shouldn't be shared sessions; the GraphQL engine does independent calls to the Service class where the @WithSession annotation is. So they should all have their own session and not conflict.

But if I put my debugger in io.quarkus.hibernate.reactive.panache.common.runtime.SessionOperations, on these lines:

  if (current != null && current.isOpen()) {
      // reactive session exists - reuse this session
      return work.apply(current);
  } else {
      // reactive session does not exist - open a new one and close it when the returned Uni completes
      return getSessionFactory()
              .openSession()
              .invoke(s -> context.putLocal(key, s))
              .chain(s -> work.apply(s))
              .eventually(() -> closeSession());
  }

I see that I get into the else branch and get two invocations of getSessionFactory() from the same vert-x thread with the same vert-x context before two invocations of s -> context.putLocal(key, s)

So maybe just a race condition in this code?

eamelink commented 1 year ago

It appears I can work around it by creating a new duplicated context in each resource method and calling the service on that new context.

But since the whole GraphQL execution is already running on a duplicated context, this creates a 'sibling' context (dixit @cescoffier https://groups.google.com/g/quarkus-dev/c/lBmQkCi_VK0), and we lose other stuff, such as the injected SecurityPrincipal.

Sanne commented 1 year ago

@eamelink many thanks for the report, that's indeed very suspicious.

Sanne commented 1 year ago

@phillip-kruger I've sent a PR to improve the code in Panache Reactive to help spotting code which is creating an illegal chain of operations: #32917

But I suspect, looking into @eamelink 's reproducer, that the mistake is happening somewhere in graphql's integration code. Could you have a look please?

I'm a bit surprised this is happening on a duplicated context, it would seem this is being marked as "safe" when it's not.

DavideD commented 1 year ago

I think it's happening here: https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/execution/AsyncExecutionStrategy.java#L48

GraphQL is trying to load the foo entities for two people in parallel and causes the error we see.

Using the Hibernate Reactive factory directly, seems to work as a workaround:

@ApplicationScoped
public class PeopleService {

    @Inject
    Mutiny.SessionFactory sf;

    public Uni<List<Person>> people() {
        return sf.withTransaction( session -> session
                .createQuery("from Person", Person.class )
                .getResultList() );
    }

    public Uni<List<Foo>> foos(Person person) {
        return sf.withTransaction( session -> session
                .createQuery( "from Foo where personId = :personId", Foo.class )
                .setParameter( "personId", person.id )
                .getResultList() );
    }
}

On a side note, you also need to change the import.sql to:

INSERT into Person (id, name) VALUES (1, 'Erik');
INSERT into Person (id, name) VALUES (2, 'Willem Jan');

INSERT into Foo (id, personId, foo) VALUES(1, 1, 'Eriks Foo');
INSERT into Foo (id, personId, foo) VALUES(2, 1, 'Eriks 2nd Foo');
INSERT into Foo (id, personId, foo) VALUES(3, 1, 'Eriks 3rd Foo');

INSERT into Foo (id, personId, foo) VALUES(4, 2, 'Willems Foo');
INSERT into Foo (id, personId, foo) VALUES(5, 2, 'Willems 2nd Foo');

(You need to specify the id for each Foo)

Sanne commented 1 year ago

@DavideD nice find, right that code seems highly supsicious. @phillip-kruger we'll need to patch that, perhaps it's easier to override with a custom implementation in Quarkus? I'll leave it to you... :)

mkouba commented 1 year ago

Using the Hibernate Reactive factory directly, seems to work as a workaround:

@DavideD Hm, but SessionFactory.withTransaction() does reuse the existing reactive session, or? I don't see a difference compared to @WithTransaction...

DavideD commented 1 year ago

@DavideD Hm, but SessionFactory.withTransaction() does reuse the existing reactive session, or? I don't see a difference compared to @WithTransaction...

Yes, I'm not sure why it works. It might be that it's just a bit faster and doesn't cause the issue.

phillip-kruger commented 1 year ago

Doing requests (read) in parallel is part of the GraphQL Spec (https://spec.graphql.org/June2018/#sec-Query). @jmartisk - do you have capacity to look at this ?

jmartisk commented 1 year ago

Sorry I don't really know how to fix it, this is far beyond my knowledge of the reactive world.

paomian commented 1 year ago

if i transfrom Uni<List<T>> to Multi<T> and use transformToUniAndMerge will case same error. use transformToUniAndConcatenate will avoid this error. it share context in many worker thread I think like https://github.com/quarkusio/quarkus/issues/32790.

phillip-kruger commented 1 year ago

@paomian so you want to do a PR?

paomian commented 1 year ago

@phillip-kruger I don't think I'm good enough to do a PR. I just report a bug.

uPagge commented 1 year ago

I faced the same problem, but when receiving REST requests. Quarkus version 3.0.1

eamelink commented 1 year ago

I wrote a bit on how we work around it: https://blog.lunatech.com/posts/2023-05-19-hibernate-reactive-gotchas

Sanne commented 1 year ago

@eamelink great writeup, thanks! You should join our brainstorming sessions on Hibernate's zulip ;)

We also discussed recently that this limitation needs to be improved on; similarly to what you write, I'm inclined to introduce a queue so to transparently process "parallel" operations in sequence.

The option to open multiple sessions is a no-go IMO; separate transactions - like you mention - are a significant problem, but it gets worse with undefined boundaries for the transactions and we also need to keep in mind resource restrictions, for example what happens when you run out of connections and can you still guarantee liveness properties and healthy progress of the business operations.

The short version is that I agree that just documenting such a limitation doesn't cut it; it was perhaps good enough in the early days as we were going fast with the prototype and we had bigger concerns, such as figuring out if the whole thing was viable, but now that mostly everything else is working as expected we need to return to this.

Thanks for the feedback!

ThomasKluiters commented 1 year ago

@eamelink Thank you so much for that blog-post, this issue was driving me insane.

Though, in your solution there was a slight issue.

Instead of:

   @Override
    public <T> Uni<T> protect(Uni<T> uni) {
        return acquire().replaceWith(uni).eventually(this::release);
    }

You meant:

    @Override
    public <T> Uni<T> protect(Supplier<Uni<T>> inner) {
        return acquire().flatMap(ignored -> inner.get()).eventually(this::release);
    }
elrob commented 1 year ago

This issue got me too. Thanks for the blog post!

It feels like there needs to be something more in the type system to make this more manageable. Sometimes the Unis that need to be protected are futher down the stack and at the point in the code where e.g. a Uni.combine().all( is invoked, it can be difficult to be aware that it requires to protecting.

uPagge commented 1 year ago

Hey, everybody. I am facing the same problem when using two @Source queries at the same time. So, if I call only one @Source query in a GraphQL query, everything works correctly, but if I call two @Source queries at once, I get an error.

gsmet commented 3 months ago

@jmartisk is it still an issue? I know you changed a few things in SmallRye GraphQL since then?

jmartisk commented 3 months ago

I don't recall any changes specifically addressing this