Open danbress opened 1 month ago
I captured a heap allocation profile of my app under load, and it seems like most of my application's heap allocations are coming from this stack:
So it seems like we are spending a lot of time resizing(growing) these two ImmutableMap.builder's
Could we provide expected size hints to these two immutable map builders?
private final ImmutableMap.Builder<ExecutableNormalizedField, MergedField> normalizedFieldToMergedField = ImmutableMap.builder();
private final ImmutableMap.Builder<ExecutableNormalizedField, QueryDirectives> normalizedFieldToQueryDirectives = ImmutableMap.builder();
To add a little more color by what i mean by "slow"
if I run ENFBenchmark1 on my laptop, I get these results:
Benchmark Mode Cnt Score Error Units
ENFBenchmark1.benchMarkThroughput thrpt 9 90939.384 ± 1947.089 ops/s
ENFBenchmark1.benchMarkAvgTime avgt 9 0.011 ± 0.001 ms/op
If I run it with my schema and my query, I get these results:
Benchmark Mode Cnt Score Error Units
ENFBenchmarkPinotNoVariables.benchMarkThroughput thrpt 9 51.829 ± 0.463 ops/s
ENFBenchmarkPinotNoVariables.benchMarkAvgTime avgt 9 19.833 ± 0.664 ms/op
Also if I do this on my schema/query:
System.out.println("fields size:" + executableNormalizedOperation.getFieldToNormalizedField().size());
then I see:
fields size: 21,356
on the schema/query in ENFBenchmark1 I see:
fields size:25
Also here is a flame graph of running ENFBenchmark with my schema/query on my laptop, captured by IntelliJ's profiler
fields size: 21,356
So just to confirm the numbers - you have an operation (aka a query operation say) that has 21,000+ fields in it?
Thats a lot if I have that right - its always going to be slower because thats beyond "sensible"
That said this might be a factor happening here making it worse that it might be. Nominally the DataFetchingFieldSelectionSet
is the possible fields under the currently executing field.
So if fieldX had 100 sub fields under it, then logically its 100 bits of data
However a while back we moved towards using ExecutableNormalizedOperation
which is a representation of the whole query and then we take a subset of that for the current field being executed.
The rational is that IF some asked for the subselection in a future field, the ExecutableNormalizedOperation
has been computed once and therefore this would be faster in aggregate.
BUT if you have 21000 fields in an operation, its going to a lot of work.
DataFetchingFieldSelectionSet
from? ps... thank you for the detailed report
So just to confirm the numbers - you have an operation (aka a query operation say) that has 21,000+ fields in it?
This is a graphql query operation in the benchmark. And yes it has 21,000+ fields in it. That does seem really high to me, I'm trying to reason about why its that high. There are not 21,000 distinct fields in our schema, I need to figure out how this number is blowing up to 21k
That said this might be a factor happening here making it worse that it might be. Nominally the DataFetchingFieldSelectionSet is the possible fields under the currently executing field. So if fieldX had 100 sub fields under it, then logically its 100 bits of data
We are asking for the selection set from the "root" query field. So based on what you are saying it sounds like it will contain everything in the query.
How many sub selection fields are inside the field that you want a DataFetchingFieldSelectionSet from? I'm not totally sure how to answer this. does "sub selection" include the nested child fields below the field that I want? If so, then the answer is "a lot"
Our schema basically looks like this: Page is our root. A Page has rows A row has cells
A cell is an interface with about 30 subclasses. Each subclass has about 5-10 fields.
So there is a lot... but I am also shocked it gets up to 21K.
Let me print out and organize the data in executableNormalizedOperation.getFieldToNormalizedField()
so that I can try and make sense of how my query explodes into 21k fields.
Thanks for your help here!
I guess what I'm still confused about and maybe you can help me with:
How come this query/schema parsing is so expensive, when the framework(DGS/Spring GraphQL) is able to parse it and associate the DataFetchers to all the elements of the schema without this expense?
In my eyes they are both doing the same thing(parsing the schema).
I also will say we implement our own graphql.execution.preparsed.PreparsedDocumentProvider
, that caches PreparsedDocumentEntry
by the Query String
. So that saves us the cost of parsing the query, but it seems like we then have to parse it again for the query introspection/lookahead DataFetchingFieldSelectionSetImpl
stuff. Is there anyway for me to cache the query/schema that gets parsed in DataFetchingFieldSelectionSetImpl
?
when the framework(DGS/Spring GraphQL) is able to parse it and associate the DataFetchers to all the elements of the schema without this expense?
Parsing the document is a once off activity (albeit it does ALL the document at once via ANTLR). This builds out an AST tree which is then re-used over and over.
The normal graphql execution is lazy and breadth first. It works only say on a root level fields, invokes just enough data fetchers to get an valye per field and then if there is field subselection present, it starts this process again. A DataFetcher never has to ask about field sub-selection or what possible types those sub selected fields are and what arguments they might have and so on.
So in fact during simple execution, the "processing" of fields and their sub fields slowly unfolds and if a value is null, then no sub-selection is performed for example and hence no processing is done.
But when you use DataFetchingFieldSelectionSet to look ahead, it builds up a ExecutableNormalisedOperation (ENO) and that has a ExecutableNormalisedField (ENF) list for ever possible field. Think of it like a query plan - maybe this field could be a Dog or a Cat or a Bird (it cant know because its not executed any values yet) and hence it builds a series of ENFs for each possible value.
The ENO processing is not lazy - its eager and hence if the field list is huge, it will build up a very large "possible field model" and this takes time. A suspect if you have a lots of "non concrete" types (interfaces and unions) then this will be even more expensive.
Here is an article on why is not a simple matter to "model" possible sub fields
https://www.graphql-java.com/blog/deep-dive-merged-fields
Nominally a DataFetchingFieldSelectionSet
is for one field and its sub selection but because it used the ENO it does all fields in the operation with the idea that if you ever asked again, its already done. One could imagine that this could be changed to only do one "field slice" of sub selection say.
But if you are asking for this in a root field with 21K possible subfields then really thats that same as doing ALL fields in a document anyway. So I dont think this would save you time.
Is there anyway for me to cache the query/schema that gets parsed in DataFetchingFieldSelectionSetImpl?
No because DataFetchingFieldSelectionSet
contains live per query information - eg argument values say
private final ImmutableMap<String, NormalizedInputValue> normalizedArguments;
private final LinkedHashMap<String, Object> resolvedArguments;
It cant be cross request cached.
Lets go back to basics - what is it you want to achieve by have the full field sub selection information during a root field fetch? What problem are you trying to solve here? Maybe there is another way
This is the problem I'm trying to solve:
my schema largely looks like this: page is my root a page has rows a row has cells.
our clients issue a query like this:
query {
page {
sections(first: 10) {
cells(first: 10) {
field1
field2
}
}
}
}
semantically this means "give me a page with 10 rows, each with 10 cells"
for the top level page
field resolver, I make an request to a backend service "the page service", like this:
pageService.getPage(numRows = X);
Where I tell the page service to give me a page with X
rows.
I want X
here to be the value of the first: Int
argument to the sections
field. In the example I've provided again, this would be the number 10
In order to do that I do the following:
env.getSelectionSet().getFields("sections");
then on the field(s) I get back I do
int count = (Integer) field.getArguments().getOrDefault("first", defaultCount);
If there is a more efficient way to achieve this, I'm 100% open to that. Thank you again for helping me work through this
cells(first: 10) {
field1
field2
}
but inside your cells
you have a massive field explosion - is that right?
I think the problem is that you want to go 1 or 2 fields deep BUT because DataFetchingFieldSelectionSet
can in theory go N levels deep (eg env.getSelectionSet().getFields("sections/cells/foo/bars");
) it needs to work out all the data.
You could hand write your own "manual" subselection code and avoid all this work. But to do this correctly for ALL clients is tricky
for example this is legal
query {
page {
a: sections(first: 10) {
cells(first: 10) {
field1
field2
}
b: sections(first: 10) {
cells(first: 10) {
field1
field2
}
}
}
}
}
and hence to say you can handle ANY client in a legal sense makes "manual" handling tricky.
One can imagine something like this as a future improvement
env.getSelectionSetMaxLevels(2).getFields("sections");
Where you could ask for only a small sub set of selections to be traversed say for efficiency in the case where your schema might have a field explosion.
cc: @andimarek
but inside your cells you have a massive field explosion - is that right?
yes and no. I think where the explosions are coming from is: there are multiple types of rows, and we are calling "cells" on each of these types
e.g.
... on RowTypeA {
cells {
... on CellFragment
}
}
... on RowTypeB {
cells {
... on CellFragment
}
}
... on RowTypeC {
cells {
... on CellFragment
}
}
underneath of the cells there are a few "deep types" that have a fair amount of sub fields, and again more interfaces/inheritance.
So I think the fact that we aren't always querying for the "cells" on the base row type is contributing to the explosion at the higher level. And also on the cell level we are doing something similar where we are accessing fields on each implementing type rather than on the base types
One can imagine something like this as a future improvement
env.getSelectionSetMaxLevels(2).getFields("sections");
Where you could ask for only a small sub set of selections to be traversed say for efficiency in the case where your schema might have a field explosion.
Yeah, I think that this is what I want. I pass a path, in my case it would be "/page/sections", and ideally it would only have to parse the query from the root page level and then the immediate children and then no further down. It could know this because the path is only two layers deep starting from the root.
If the client asked for multiple sections via aliasing like you described, we handle that by getting all the section fields and taking the max value of first
OK, now I have something I think works
I see I have access to this: DataFetchingEnvironment.getOperationDefinition().getSelectionSet().getSelections();
I have to do a little bit of traversing to get down to the "sections" field, and then we pass the argument value as a variable reference, so i have to look that up indirectly... but this seems to work and is fast!
I am glad you have something working. graphql is tricky because nominally it feels like you just take the AST SelectionSet and you have sub fields BUT this is not true in a general case.
A simple AST traversal of this is tricky
{
foo(id: "123") {
id
}
foo(id: "123") {
name
}
foo(id: "123") {
id
name
}
}
The class graphql.analysis.QueryTraverser
might help you since it can traverse AND give you field types as it goes and be told to abort
You can as you have seen you can build a "good enough" solution but the graphql-java library has to build a "handle all cases" mechanism. And the ENO / ENF solution works in a general sense but when their is a combinatorial explosion of fields because if interfaces it gets expensive. We can look to make it able to be less expensive say.
Thanks for the detailed explanation and your findings. This has been an excellent issue report.
I recently captured a few flame graphs of my application, and notice that
DataFetchingFieldSelectionSetImpl.computeValuesLazily()
shows up in 25% of the flame graphs.computeValuesLazily
? When I remove callinggetSelectionSet()
in my application code the application uses 40% as much CPU, so It seems like without this the graphql-java is able to understand the query and execute the datafetchers.details: My app is a DGS/spring-graphql/Springboot app
in my application code in the DataFetcher for a top level Query field I am calling:
DataFetchingEnvironment.getSelectionSet().getFields("subField");
so that I can then get an argument to subfield so that I can use it to make a call to the backend to get the right data.Also my schema is kind of large, which may be why
libraries: