graphql-java / graphql-java

GraphQL Java implementation
https://graphql-java.com
MIT License
6.12k stars 1.12k forks source link

DataFetchingFieldSelectionSetImpl.computeValuesLazily() is slow #3725

Open danbress opened 1 month ago

danbress commented 1 month ago

I recently captured a few flame graphs of my application, and notice that DataFetchingFieldSelectionSetImpl.computeValuesLazily() shows up in 25% of the flame graphs.

details: My app is a DGS/spring-graphql/Springboot app

in my application code in the DataFetcher for a top level Query field I am calling: DataFetchingEnvironment.getSelectionSet().getFields("subField"); so that I can then get an argument to subfield so that I can use it to make a call to the backend to get the right data.

Also my schema is kind of large, which may be why

libraries:

javaVersion 21.0.3.0.101
spring-boot 3.3.4
spring graphql-1.3.2
graphql-java-22.1
danbress commented 1 month ago

I captured a heap allocation profile of my app under load, and it seems like most of my application's heap allocations are coming from this stack:

image

So it seems like we are spending a lot of time resizing(growing) these two ImmutableMap.builder's

Could we provide expected size hints to these two immutable map builders?


        private final ImmutableMap.Builder<ExecutableNormalizedField, MergedField> normalizedFieldToMergedField = ImmutableMap.builder();
        private final ImmutableMap.Builder<ExecutableNormalizedField, QueryDirectives> normalizedFieldToQueryDirectives = ImmutableMap.builder();
danbress commented 1 month ago

To add a little more color by what i mean by "slow"

if I run ENFBenchmark1 on my laptop, I get these results:

Benchmark                           Mode  Cnt      Score      Error  Units
ENFBenchmark1.benchMarkThroughput  thrpt    9  90939.384 ± 1947.089  ops/s
ENFBenchmark1.benchMarkAvgTime      avgt    9      0.011 ±    0.001  ms/op

If I run it with my schema and my query, I get these results:

Benchmark                                          Mode  Cnt   Score   Error  Units
ENFBenchmarkPinotNoVariables.benchMarkThroughput  thrpt    9  51.829 ± 0.463  ops/s
ENFBenchmarkPinotNoVariables.benchMarkAvgTime      avgt    9  19.833 ± 0.664  ms/op

Also if I do this on my schema/query:

System.out.println("fields size:" + executableNormalizedOperation.getFieldToNormalizedField().size());

then I see:

fields size: 21,356

on the schema/query in ENFBenchmark1 I see:

fields size:25
danbress commented 1 month ago

Also here is a flame graph of running ENFBenchmark with my schema/query on my laptop, captured by IntelliJ's profiler image

bbakerman commented 1 month ago

fields size: 21,356

So just to confirm the numbers - you have an operation (aka a query operation say) that has 21,000+ fields in it?

Thats a lot if I have that right - its always going to be slower because thats beyond "sensible"

That said this might be a factor happening here making it worse that it might be. Nominally the DataFetchingFieldSelectionSet is the possible fields under the currently executing field.

So if fieldX had 100 sub fields under it, then logically its 100 bits of data

However a while back we moved towards using ExecutableNormalizedOperation which is a representation of the whole query and then we take a subset of that for the current field being executed.

The rational is that IF some asked for the subselection in a future field, the ExecutableNormalizedOperation has been computed once and therefore this would be faster in aggregate.

BUT if you have 21000 fields in an operation, its going to a lot of work.

bbakerman commented 1 month ago

ps... thank you for the detailed report

danbress commented 1 month ago

So just to confirm the numbers - you have an operation (aka a query operation say) that has 21,000+ fields in it?

This is a graphql query operation in the benchmark. And yes it has 21,000+ fields in it. That does seem really high to me, I'm trying to reason about why its that high. There are not 21,000 distinct fields in our schema, I need to figure out how this number is blowing up to 21k

That said this might be a factor happening here making it worse that it might be. Nominally the DataFetchingFieldSelectionSet is the possible fields under the currently executing field. So if fieldX had 100 sub fields under it, then logically its 100 bits of data

We are asking for the selection set from the "root" query field. So based on what you are saying it sounds like it will contain everything in the query.

How many sub selection fields are inside the field that you want a DataFetchingFieldSelectionSet from? I'm not totally sure how to answer this. does "sub selection" include the nested child fields below the field that I want? If so, then the answer is "a lot"

Our schema basically looks like this: Page is our root. A Page has rows A row has cells

A cell is an interface with about 30 subclasses. Each subclass has about 5-10 fields.

So there is a lot... but I am also shocked it gets up to 21K.

Let me print out and organize the data in executableNormalizedOperation.getFieldToNormalizedField() so that I can try and make sense of how my query explodes into 21k fields.

Thanks for your help here!

danbress commented 1 month ago

I guess what I'm still confused about and maybe you can help me with:

How come this query/schema parsing is so expensive, when the framework(DGS/Spring GraphQL) is able to parse it and associate the DataFetchers to all the elements of the schema without this expense?

In my eyes they are both doing the same thing(parsing the schema).

I also will say we implement our own graphql.execution.preparsed.PreparsedDocumentProvider, that caches PreparsedDocumentEntry by the Query String. So that saves us the cost of parsing the query, but it seems like we then have to parse it again for the query introspection/lookahead DataFetchingFieldSelectionSetImplstuff. Is there anyway for me to cache the query/schema that gets parsed in DataFetchingFieldSelectionSetImpl?

bbakerman commented 1 month ago

when the framework(DGS/Spring GraphQL) is able to parse it and associate the DataFetchers to all the elements of the schema without this expense?

Parsing the document is a once off activity (albeit it does ALL the document at once via ANTLR). This builds out an AST tree which is then re-used over and over.

The normal graphql execution is lazy and breadth first. It works only say on a root level fields, invokes just enough data fetchers to get an valye per field and then if there is field subselection present, it starts this process again. A DataFetcher never has to ask about field sub-selection or what possible types those sub selected fields are and what arguments they might have and so on.

So in fact during simple execution, the "processing" of fields and their sub fields slowly unfolds and if a value is null, then no sub-selection is performed for example and hence no processing is done.

But when you use DataFetchingFieldSelectionSet to look ahead, it builds up a ExecutableNormalisedOperation (ENO) and that has a ExecutableNormalisedField (ENF) list for ever possible field. Think of it like a query plan - maybe this field could be a Dog or a Cat or a Bird (it cant know because its not executed any values yet) and hence it builds a series of ENFs for each possible value.

The ENO processing is not lazy - its eager and hence if the field list is huge, it will build up a very large "possible field model" and this takes time. A suspect if you have a lots of "non concrete" types (interfaces and unions) then this will be even more expensive.

Here is an article on why is not a simple matter to "model" possible sub fields

https://www.graphql-java.com/blog/deep-dive-merged-fields

Nominally a DataFetchingFieldSelectionSet is for one field and its sub selection but because it used the ENO it does all fields in the operation with the idea that if you ever asked again, its already done. One could imagine that this could be changed to only do one "field slice" of sub selection say.

But if you are asking for this in a root field with 21K possible subfields then really thats that same as doing ALL fields in a document anyway. So I dont think this would save you time.

bbakerman commented 1 month ago

Is there anyway for me to cache the query/schema that gets parsed in DataFetchingFieldSelectionSetImpl?

No because DataFetchingFieldSelectionSet contains live per query information - eg argument values say

    private final ImmutableMap<String, NormalizedInputValue> normalizedArguments;
    private final LinkedHashMap<String, Object> resolvedArguments;

It cant be cross request cached.

bbakerman commented 1 month ago

Lets go back to basics - what is it you want to achieve by have the full field sub selection information during a root field fetch? What problem are you trying to solve here? Maybe there is another way

danbress commented 1 month ago

This is the problem I'm trying to solve:

my schema largely looks like this: page is my root a page has rows a row has cells.

our clients issue a query like this:

query {
  page {
   sections(first: 10) {
     cells(first: 10) {
       field1
       field2
    }
   }
  }
}

semantically this means "give me a page with 10 rows, each with 10 cells" for the top level page field resolver, I make an request to a backend service "the page service", like this: pageService.getPage(numRows = X); Where I tell the page service to give me a page with X rows. I want X here to be the value of the first: Int argument to the sections field. In the example I've provided again, this would be the number 10

In order to do that I do the following: env.getSelectionSet().getFields("sections"); then on the field(s) I get back I do int count = (Integer) field.getArguments().getOrDefault("first", defaultCount);

If there is a more efficient way to achieve this, I'm 100% open to that. Thank you again for helping me work through this

bbakerman commented 1 month ago
cells(first: 10) {
       field1
       field2
    }

but inside your cells you have a massive field explosion - is that right?

I think the problem is that you want to go 1 or 2 fields deep BUT because DataFetchingFieldSelectionSet can in theory go N levels deep (eg env.getSelectionSet().getFields("sections/cells/foo/bars");) it needs to work out all the data.

You could hand write your own "manual" subselection code and avoid all this work. But to do this correctly for ALL clients is tricky

for example this is legal

query {
  page {
   a: sections(first: 10) {
     cells(first: 10) {
       field1
       field2
    }
   b: sections(first: 10) {
     cells(first: 10) {
       field1
       field2
    }
   }
   }
  }
}

and hence to say you can handle ANY client in a legal sense makes "manual" handling tricky.

One can imagine something like this as a future improvement

env.getSelectionSetMaxLevels(2).getFields("sections");

Where you could ask for only a small sub set of selections to be traversed say for efficiency in the case where your schema might have a field explosion.

cc: @andimarek

danbress commented 1 month ago

but inside your cells you have a massive field explosion - is that right?

yes and no. I think where the explosions are coming from is: there are multiple types of rows, and we are calling "cells" on each of these types

e.g.

... on RowTypeA {
   cells {
     ... on CellFragment
   }
}
... on RowTypeB {
   cells {
     ... on CellFragment
   }
}
... on RowTypeC {
   cells {
     ... on CellFragment
   }
}

underneath of the cells there are a few "deep types" that have a fair amount of sub fields, and again more interfaces/inheritance.

So I think the fact that we aren't always querying for the "cells" on the base row type is contributing to the explosion at the higher level. And also on the cell level we are doing something similar where we are accessing fields on each implementing type rather than on the base types

danbress commented 1 month ago

One can imagine something like this as a future improvement

env.getSelectionSetMaxLevels(2).getFields("sections");

Where you could ask for only a small sub set of selections to be traversed say for efficiency in the case where your schema might have a field explosion.

Yeah, I think that this is what I want. I pass a path, in my case it would be "/page/sections", and ideally it would only have to parse the query from the root page level and then the immediate children and then no further down. It could know this because the path is only two layers deep starting from the root.

If the client asked for multiple sections via aliasing like you described, we handle that by getting all the section fields and taking the max value of first

danbress commented 1 month ago

OK, now I have something I think works

I see I have access to this: DataFetchingEnvironment.getOperationDefinition().getSelectionSet().getSelections();

I have to do a little bit of traversing to get down to the "sections" field, and then we pass the argument value as a variable reference, so i have to look that up indirectly... but this seems to work and is fast!

bbakerman commented 1 month ago

I am glad you have something working. graphql is tricky because nominally it feels like you just take the AST SelectionSet and you have sub fields BUT this is not true in a general case.

A simple AST traversal of this is tricky

{
  foo(id: "123") {
    id
  }
  foo(id: "123") {
    name
  }
  foo(id: "123") {
    id
    name
  }
}

The class graphql.analysis.QueryTraverser might help you since it can traverse AND give you field types as it goes and be told to abort

You can as you have seen you can build a "good enough" solution but the graphql-java library has to build a "handle all cases" mechanism. And the ENO / ENF solution works in a general sense but when their is a combinatorial explosion of fields because if interfaces it gets expensive. We can look to make it able to be less expensive say.

Thanks for the detailed explanation and your findings. This has been an excellent issue report.