performance issue using nested elements in query

janknobloch commented 3 years ago

Hi all,

im facing serious performance drops using a nested query for a tree like structure using quarkus 1.13.2 and its graph-ql extension:

when using:

navigation {
      name
      children {
        name
        children {
          name
          children {
            name
          }
        }
      }
    }

im able to reach about 200 req/s via k6s load testing, whereas:

using the exact same query but removing the nested query part and replacing it with a single key:value pair as follows:

navigation : objectMapperOfyourChoice.writeValueAsString(navigation);

im able to increase throughput from about roughly 200 req/s to 800 req/s using k6s running 1000 VUs and 1000 Req/s aimed for.

To clarify I was able to pinpoint this issue down to the nested vs. non-nested query part for my use-case. My navigation can be rather large with up to depth = 8, and about 5-10 properties inside each navigation element, however its always computed in the same way no matter how I return it via graphql.

Any clues on this behaviour? As a workaround I'm using the json string response type without nested query for now, but I would love to eliminate this issue and can try to run some more load testing potential fixes in the upcoming days, if that helps.

I assume the nested query element should not have such a big impact on performance if im not mistaken?

Best Jan

phillip-kruger commented 3 years ago

Hi @janknobloch . Can you share your code ? Or a reproducer ? Are you using @Source ? And if so, are you using batch ?

janknobloch commented 3 years ago

hey @phillip-kruger, thanks for your fast reply.

hope to share a reproducer for the issue by the weekend at the latest.

in the meantime, a little more information.

Im not using @Source, at least not that im aware of. Im using @Context to remove some elements not existent in a concrete query (but this should be independent from the issue as the speed drop only occurs when using the query stated).

My Navigation POJO "TreeWrapper" looks like this in the raw form:

CoreNavigationElement {
    String val;
    String val2;
    @Ignore
    String val3;
    List<CoreNavigationElement> children;
}

Basically when I remove the navigation part of the query the speed drop is gone, but the speed drop is also gone if I forcefully set my navigation object to null and keep the query in place.

It looks like that the model transformation done inside graphql-java seems to be very slow when it comes to recursive lookups or tree like structures? / and or maybe my @ignore field has something to do with it.

I'll try to setup a plain project as a reproducer on the weekend but maybe this issue is actually somehow related to :

https://github.com/graphql-java/graphql-java/issues/2185

phillip-kruger commented 3 years ago

Hi @janknobloch . Thanks a reproducer would help. Are you getting data from a database ?

janknobloch commented 3 years ago

Yep, but not directly. Im holding all data needed for the queries inside an in-memory cache using caffeine, do some minimal transformations on them and return them via graphql.

What bugs me is that a JSON string response even for the full navigation object while using an object mapper on the way, seems to be so much faster than than letting graphql "reduce the depth / transform" of the object and returning it.

I try my best to present a minimal reproducer ASAP.

phillip-kruger commented 3 years ago

Thanks. I see there is another fix done in graphql-java, that should be part of 17.0, maybe that fixes it. We will update to 17.0 as soon as it release.

phillip-kruger commented 3 years ago

We can actually confirm above once you have a reproducer. We can do a build of graphql-java local and include that to see if 17.0 will fix that.

janknobloch commented 3 years ago

@phillip-kruger faster than expected:

https://github.com/janknobloch/graphql-performance-reproducer

wrote a small readme.md - the only dependency you should need despite of quarkus to reproduce this is the k6 lib:

https://k6.io/docs/getting-started/installation/

My results are attached:

Navigation via graphql query:

Navigation via Json Wrapper:

I limited the results to 500 req/s as you can see via graphql (using nested navigation) the results time out at about 280 req/s while the other graphql query (using a json wrapper for this navigation) is actually capped at 500.

Best Jan

phillip-kruger commented 3 years ago

So, looking at the code, the json will always be quicker than the navigation. When returning a Object from a graphql query, every field is querable, and thus being called per field when we build the response.

You can test against the latest graphql-java by adding this to your pom (overriding the one from smallrye)

<dependency>
        <groupId>com.graphql-java</groupId>
        <artifactId>graphql-java</artifactId>
        <version>2021-05-03T08-53-12-8f1ec444</version>
    </dependency>

However I am not sure this will solve your issue. Let me know.

janknobloch commented 3 years ago

Just tested, sadly the same behaviour.

While it makes sense of course that the querying of fields takes longer to process, 280 req/s vs 980 req/s Is basically a 70% performance drop using nested objects. Considering that the graphql spec was specifically designed to reduce over-fetching and reducing network traffic I somehow feel this totally contradicts its purpose.?

I haven't looked in the details on how nested objects are created yet (removing non queried fields), but hence the objects on a certain depth level are per-se equal, in my case there should be a maximum number of different objects of size depth to be created (in my case 5). e.g.

child{
attr1
  child{
  attr2
  }
  child{
  att3
  }
}

for the above object there is a maximum of 3 object types to create. (attr1, attr2, and attr3 - respectively) I still ask myself where graphql looses up to 70% of throughput by creating 3 objects types and cloning/copying values into them?

Maybe I see this in a wrong perspective because I don't know the internals yet, maybe you can enlighten me a bit? :-)

phillip-kruger commented 3 years ago

Every object are create by calling DataFetchers that fetch the data. In the case of a pojo, every field is called and added to the response, so in this case using Source and batch might help. But it will always be slower that just returning the json. That is the tradeoff, query-able and slower, or not query-able and faster.

janknobloch commented 3 years ago

Actually a bit more testing shows that is linked to the depth - for whats it worth.

querying only the first child elements keeps performance high (1500 req/s), already adding the second child reduces performance by 50 %. ( 700 req/s) -> so it looks its linear dropping down performance by depth.

I'll read into Source and Batch (whereas Batch seems to be deprecated).

Thanks for your feedback.

phillip-kruger commented 3 years ago

I am talking about this batch : https://quarkus.io/blog/experimental_graphql/#batch

janknobloch commented 3 years ago

So I played around a bit according to your recommendations, using @Source in conjunction with async requests for all child elements I could double the performance up to 620 req/s.

I couldn't see how batches could help me out as my object is already in-memory stored as a tree object.

Still far of from the 1500 req/s but its something.. ;-)

I could not quite find the right entry point in trying to maybe register my own Datafetcher for my given type to play around with performance testing, any feedback would be appreciated.

phillip-kruger commented 3 years ago

Batch might work, as there will be less method calls. Depending on the size of children, this can make a big difference.

janknobloch commented 3 years ago

Hey @phillip-kruger thanks for sticking with me.. ;-)

If you referring to use Source / Batch and returning a list of Childs as a batch like so:

 @Query("fastChild")
    public CompletableFuture<List<Child>> getChilds(@Source Navigation navigation) {
        return CompletableFuture.supplyAsync(() -> navService.getNavigation().getChildren());

    }

    @Query("fastChild")
    public CompletableFuture<List<Child>> getChilds(@Source Child child) {
        return CompletableFuture.supplyAsync(() -> child.getChildren());
    }

thats exactly what I did to increase up to 620 req/s. Is this what you envisioned?

Interesting fact is also that quarkus-graphql 1.5.0 had double the performance than 1.13.2.

Also I still think about if it wouldn't be possible to increase performance dramatically on nested Lists, as we know from the query that on a certain hierarchy level, all children have to have the same data structure as the query determines how the retrieved objects need to be build on a per depth level.

janknobloch commented 3 years ago

So found some time to dig in the codebase and build a local snapshot of 1.0.x and it seems that the FieldDataFetcher -> fieldHelper.transformResponse(); is degrading performance massively. (280 req/s -> 900 req/s).

As I have interpreted the class comments correctly this transformation is only needed on non default getter/setter methods e.g. Source enrichments?

Removing the transformation step and returning: return (T) resultFromMethodCall; directly increases the request rate by more than 3 times.

Looking at java-graphql for the default PropertyDataFechter I couldn't find any of those recursive lookups? https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/schema/PropertyDataFetcher.java

Could you elaborate when those are needed?

Would it be possible to add like an API Annotation / config flag to disable those lookups if you do not want to work with enrichments or model transformations?

Another 50 req/s could be added by fixing duplicate fieldNames due to (java-graphql 16.x issue already posted here) inside the SmallryeContext.javaby filtering the getSelectedFields()using a Set.

Set<SelectedField> fields = new LinkedHashSet(selectionSet.getFields());

to remove duplicates at least until this gets fixed? (Its a workaround anyhow as the DataFetchingEnvironment still contains the duplicates, however not exposed from the SmallryeContext).

Would be nice to see some performance boost aside the async / batch calls (which i hopefully tested properly) as they basically just hiding the underlying issue with the FieldHelper.

Best Jan

phillip-kruger commented 3 years ago

Hi Jan, above is very good information, we can skip the transformation totally if there is nothing to transform, that should be an easy change. Another change that might speed this up is disabling the context if there is no context. I will make changes for both of these then we check again if that is ok with you ?

Also, wrt batch, please try this : (note the List on the input)

@Query("fastChild")
public CompletableFuture<List<List<Child>>> getChilds(@Source List<Child> children) {
// Here loop and gather the results in a list of the same order as the input
}

This should also speed up the response.

Thanks for you help on this, we need to tune this.

janknobloch commented 3 years ago

Hey Phillip, thanks coming back to this.

wrt batching I tried your suggestion and it indeed improved performance up to 800 req/s - which is quite something.

Looking forward to have your suggestions transparent for users without the need of using the aforementioned Source "trick".

Feel free to drop me a message any time if there is something new to explore for me. :-)

Also, sadly I wasn't able to setup this project completely inside eclipse env (maven builds work fine - but force local building on any change - embedding the snapshot in my test project and running again), but it seems eclipse did not want to detect the implementation "src" folders properly. But thats something for another day.

Also I forked the project and added a hotfix for https://github.com/smallrye/smallrye-graphql/issues/713 - if thats of your concern feel free to use it - otherwise just ignore it. :-)

Thanks for your support - have a great evening.

phillip-kruger commented 3 years ago

@janknobloch - I added the first of some PR to try and optimize this. Once all is in and it's in Quarkus, would be great if you can test again. W.r.t you question on PropertyDataFetcher and why we need our own one, we allow transformation on fields, that needs to be done before we return the output or process the input. (See the spec) The PR will cache the transformer so it should help a bit

phillip-kruger commented 3 years ago

As I have interpreted the class comments correctly this transformation is only needed on non default getter/setter methods e.g. Source enrichments?

No it's needed on all fields (properties and methods, recursively when in a List)

janknobloch commented 3 years ago

Sure let me know @phillip-kruger !

stopping recursion whenever possible already looks good - hence all objects in a list - are per query/depth equal there may be even more potential to optimize this. I hope I also find some time to play around with it.

Definitely drop another message here when it’s in 1.20 / or Quarkus. Looking forward to test and present my results.

Have a great weekend !

ybroeker commented 3 years ago

Just tried a few things:

1.0.26 is about 300req/s and 1.0.x with @phillip-kruger's changes integrated speeds it up to 850req/s.

A big bottleneck is currently the conversion of the result to JSON. Currently it is Object -> String -> JsonValue. The step from Object to String with Jsonb and then the parsing afterwards is relatively time-consuming and actually superfluous. Doing the conversion "by hand" speeds it up to about 1100req/s.

A rough prototype is here: https://github.com/ybroeker/smallrye-graphql/tree/perf/toJson

janknobloch commented 3 years ago

Very nice indeed @ybroeker - cool that those minor optimizations lead to such a big impact!

Thanks for also looking into this!

phillip-kruger commented 3 years ago

Very cool @ybroeker !! I have not looked at your prototype yet, but how about a PR?

janknobloch commented 3 years ago

looks like this is fixed since 1.2.0 - however 1.2.x has not been pushed to the quarkus extension yet, if i'm not mistaken? Any plan to push this fix (1.2.x++) through to benchmark within quarkus? @phillip-kruger ? ;-)

t1 commented 3 years ago

@janknobloch Have you tried the 2.0.0.RC? IIRC that one is updated.

phillip-kruger commented 3 years ago

You can actually try 2.0.0.Final (just remove the universe as the platform is not released yet)

janknobloch commented 3 years ago

@t1 thanks for the hint! :-) - I tested with 2.0.0.CR3

Happy to report my reproducer jumped from aprox. 600 req/s to 1200 req/s ;-)

Thanks @phillip-kruger and @ybroeker for pushing this in, looking forward running on Quarkus 2.0.0 final soon.

smallrye / smallrye-graphql

performance issue using nested elements in query #770