Memoization of collectSubfields issue (optimization potential)

Cito commented 4 years ago

The memoization of the collectSubfields function in execute.js has a subtle issue that makes the caching less effective than it could be.

The problem is that the WeakMap in memoize3 uses an array as input. If different arrays with the same items arrive as input, these are considered as different inputs, although they would result in the same output. In that case, unnecessary cache entries are created instead of fetching results from the cache.

I have measured that the memoization cache is currently hit 11263 times when running the complete test suite, but potentially could be hit 11427 times. So this is only a small problem when running the test suite, but maybe there are cases where it could be a bigger problem.

A simple way to improve this would be to use the first item of the array instead of the array itself in case the array constains only one item. In fact this is usually the case here (collectSubfields is normally called with only one field node, only in 2 of 12700 cases in the test suite it is called with two nodes).

Maybe the memoize3 function should be moved to execute.js with that optimization? Or maybe it should be even inlined into the collectSubfields method?

I am not sure whether it is worthwile to optimize this. As mentioned, it does not seem to make a big difference when running the test suite, and probably does not make a big difference in practice.

A maybe bigger issue is that the memoization includes the ExecutionContext. If documents are cached, then it would be benefitial if the ExecutionContext.document would be used instead, because the ExecutionContext is built every time a query is executed, i.e. the subfields cache cannot be reused between queries.

Do you think these ideas for optimization are worth pursuing further?

IvanGoncharov commented 4 years ago

@Cito Interesting suggestion and excellent analysis 👍

The current plan is to set up an experimental 16.0.0 after 15.1.0 so we can make bigger experiments. So I want to use this opportunity and experiment with execution plans for the entire query. AFAIK few other GraphQL implementations have this step in their pipelines. But if we learn that it's dead-end then we can cache as much as possible inside execute.

yaacovCR commented 3 weeks ago

note: we have to use the whole context to cache or at least some of it, ie the variables, as collecting subfields is variable dependent.

graphql / graphql-js

Memoization of collectSubfields issue (optimization potential) #2559