quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.85k stars 2.7k forks source link

SmallRye GraphQL source fields metrics not working #34397

Open akHelix4444 opened 1 year ago

akHelix4444 commented 1 year ago

Describe the bug

Good day to You!

I try use GraphQL and achieve Micrometer metrics for my endpoints. I read Docs and see base metrics implementation in SmallRye GraphQL repository.

But, when i test my sandbox project metrics and events (both EventingService implementation and Observes) behaviour looks strange.

Please, could you help me explain and resolve 3 things? 1) Metrics for @Source fields always 0 it is add on startup with other operations, but then don't calculate because afterDataFetch observer method never execute for Source fields; 2) I create batch @Source operation and in code this method execute once, but in same time beforeDataFetch execute lots of times for each record in my batch list; 3) Because afterDataFetch not execute for @Source fields context not removed from map (for metrics calculate) (Long startTime = startTimes.remove(context);) and map increment size.

Maybe i have a wrong view, but do we have a way for calculate execution metrics for source fields in same event base way like for general operations?

I try this in both default (Blocking) and NonBlocking way.

Expected behavior

I execute query 1:

query findHeroById {
  selectedHero: hero(heroId: "100") {
    __typename
    id
    name
    slug
    stats: statsInfo {
      __typename
      intelligence
      power
      combat
      speed
      strength
      durability
    }
  }
}

and query 2:

query heroesList {
  heroes: heroCollection {
    __typename
    id
    name
    slug
    stats: statsInfo {
      __typename
      intelligence
      power
      combat
      speed
      strength
      durability
    }
  }
}

And metrics should calculate:

mp_graphql_seconds_count{name="statsInfo",source="true",type="QUERY",} 0.0
mp_graphql_seconds_sum{name="statsInfo",source="true",type="QUERY",} 0.0

And afterDataFetch should fire for source fields and i see in logs:

2343534452 - Start fetch data: statsInfo
2343534452 - Complete fetch data: statsInfo

And "metrics map" should not increase size.

Actual behavior

Metrics have zero values:

mp_graphql_seconds_count{name="statsInfo",source="true",type="QUERY",} 0.0
mp_graphql_seconds_sum{name="statsInfo",source="true",type="QUERY",} 0.0

Only beforeDataFetch message in logs.

2343534452 - Start fetch data: statsInfo

And "metrics map" increase size after each execution.

== startTimes map size - 10 ==

How to Reproduce?

Reproducer example: https://github.com/akHelix4444/Quarkus-Playground/tree/r-gql-events-metrics Branch: r-gql-events-metrics

Execute queries:

query findHeroById {
  selectedHero: hero(heroId: "100") {
    __typename
    id
    name
    slug
    stats: statsInfo {
      __typename
      intelligence
      power
      combat
      speed
      strength
      durability
    }
  }
}
query heroesList {
  heroes: heroCollection {
    __typename
    id
    name
    slug
    stats: statsInfo {
      __typename
      intelligence
      power
      combat
      speed
      strength
      durability
    }
  }
}

See logs and metrics values as describe above.

Output of uname -a or ver

No response

Output of java -version

openjdk version "17.0.7" 2023-04-18 OpenJDK Runtime Environment Temurin-17.0.7+7 (build 17.0.7+7) OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (build 17.0.7+7, mixed mode, sharing)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.16.7.Final (2.13.8.Final - same behaviour)

Build tool (ie. output of mvnw --version or gradlew --version)

Gradle 8.1.1 Build time: 2023-04-21 12:31:26 UTC Revision: 1cf537a851c635c364a4214885f8b9798051175b Kotlin: 1.8.10 Groovy: 3.0.15 Ant: Apache Ant(TM) version 1.10.11 compiled on July 10 2021 JVM: 17.0.7 (Eclipse Adoptium 17.0.7+7) OS: Windows 11 10.0 amd64

Additional information

No response

quarkus-bot[bot] commented 1 year ago

/cc @Ladicek (smallrye), @ebullient (metrics), @evanchooly (kotlin), @geoand (kotlin), @jmartisk (graphql,metrics,smallrye), @phillip-kruger (graphql,smallrye), @radcortez (smallrye)

akHelix4444 commented 1 year ago

Sorry don't expect so much mentions. And Kotlin wrong in here, i think.

phillip-kruger commented 1 year ago

@jmartisk can you look at this ?

jmartisk commented 1 year ago

@mskacelik is going to have a look

akHelix4444 commented 1 year ago

Oh, could i ask one more thing, in SmallRye GraphQL repository for metrics calculation use duration between beforeDataFetch and afterDataFetch happened. But, errorDataFetch always execute after afterDataFetch and in fact, event about operation complete already happened, but i don't know at this time about exception happened or not.

Now it is looks like:

>>>>> Received beforeDataFetch event
>>>>> Received afterDataFetch event
>>>>> Received errorDataFetch event

And it is not looks like imperative try-catch-finally way to metrics, when i could save start time, fire error metric in catch (and skip finally) or fire complete metric in finally. And, i think, it is not the same with reactive way, because i have both "onComplete" and "onFail" events at same time.

Could you answer, is there a way to solve the problem of programmatically collecting metrics about GraphQL successfully completed queries and queries with exceptions, based on the GraphQL eventing model?

jmartisk commented 1 year ago

We don't make any difference between successful and failed operations for metric purposes. And with the current eventing model, it may not be possible at all. In fact, I think the current eventing model is unsuitable for collecting metrics in general. We will probably need to rethink metrics completely, and collect them by instrumenting data fetchers instead.

akHelix4444 commented 1 year ago

@jmartisk Thank you, for answer!

Oh, in another way i could use Micrometer annotations on GraphQL methods, it is more "static" and not centralize way, but it is have chances to success.

And, maybe, you could give me idea, what do you mean for instrumenting data fetchers? Could i have a way to implementing this by myself in my services for centralize GraphQL methods metrics, or core SmallRye GraphQL functionality refactoring needed for implement it?

jmartisk commented 1 year ago

If you're unhappy with how metrics are calculated right now, which I admit would be quite understandable, feel free to annotate your queries with Micrometer annotations by yourself, yes.

And, maybe, you could give me idea, what do you mean for instrumenting data fetchers? Could i have a way to implementing this by myself in my services for centralize GraphQL methods metrics, or core SmallRye GraphQL functionality refactoring needed for implement it?

Data fetchers are classes located in the SmallRye GraphQL and Quarkus codebases, so you can't do this inside an application. You should be able to use annotations though.

akHelix4444 commented 1 year ago

Thank you @jmartisk !

Will try annotation way.

We have both REST and GraphQL API, and GraphQL API metrics have some tricky parts)

akHelix4444 commented 1 year ago

@jmartisk sorry me, could i ask one more question about using metrics around GraphQL methods.

Could i use CDI Interceptors around GraphQL methods for centralize some metrics, tracing and logging tasks, or this way could have some issues (in reactive way, for example)?

Annotations way work fine, but i think: is there a way for me to add some programmatically logic.

jmartisk commented 1 year ago

I think reactive code (asynchronous return types) will be a problem when using CDI interceptors, these are called right after the method itself returns, not when the asynchronous code finishes.

Ladicek commented 1 year ago

It is possible to write a CDI interceptor that handles asynchronous methods correctly, but it needs work. Here's a simple example, in the real world, it will likely be more complex (e.g. will need to handle Uni in addition to / instead of CompletionStage):

@MyInterceptorBinding
@Interceptor
@Priority(1)
public class MyInterceptor {
    @AroundInvoke
    Object intercept(InvocationContext ctx) {
        if (isAsync(ctx.getMethod())) {
            // do something before proceeding
            CompletableFuture result = new CompletableFuture();
            CompletionStage future = (CompletionStage) ctx.proceed(); // may need `try` / `catch` to handle exceptions
            future.whenComplete((value, error) -> {
                // do something after proceeding
                result.complete(...); // or `completeExceptionally(...)`
            });
            return result;
        } else {
            // do something before proceeding
            Object result = ctx.proceed(); // may need `try` / `catch` to handle exceptions
            // do something after proceeding
            return result;
        }
    }

    boolean isAsync(Method method) {
        return CompletionStage.class.isAssignableFrom(method.getReturnType());
    }
}
akHelix4444 commented 1 year ago

@jmartisk @Ladicek thank you a lot for the answers and the code sample, i understood logic and what i need to be careful about. Will try it in some non-critical places.

jmartisk commented 1 year ago

FYI this will (hopefully) be resolved via https://github.com/smallrye/smallrye-graphql/pull/1875 in Quarkus 3.3

akHelix4444 commented 1 year ago

Thank you a lot @mskacelik and @jmartisk , i will try to check as soon as possible.

We haven't updated to version 3 yet. Tell me, please, this update cannot be taken into account in 2.16.*?

jmartisk commented 1 year ago

@akHelix4444 I'm afraid the internal changes are quite substantial so I'm hesitant to backport them into a stream where we should be really conservative :(

gsmet commented 1 year ago

I confirm it's not something I want to backport.

akHelix4444 commented 1 year ago

Oh, thank you @jmartisk , @gsmet , i understand, it's okay. I just wanted to know in order to understand which version to focus on.