Horizontal eval for lazy loading same-type objects

rudiedirkx commented 7 years ago

When I first learned about GraphQL there was a performance chapter. If every relation loads every single relation, it requires many queries to load a few levels of data, even if the actual data is only on the last level. The solution was to lazy load objects: first collect ids (horizontal), and then load all objects at once, and then evaluate the next level.

Just in case. This query:

query {
    user {
        reservations {
            resource {
                sport {
                    name
                }
            }
        }
    }
}

would take 20 queries to find 10 reservations > 10x 1 resource > 10x 1 sport. If it would eval horizontally, it would do 2 queries: load 10 resources, then load 10 sports. (Maybe not 2 levels, but even 1 could be a big difference.)

Is that a thing? It seems like the query is evaled vertically first, so it loads the entire 1 reservation (1 resource + 1 sport), and then the entire 1 next (1 resource + 1 sport) etc.

rudiedirkx commented 7 years ago

Some kind of preview would also be great: after parsing the query, without any eval, ask it which types are being requested. Or on any level, ask which types are on the next level(s). Any idea?

vladar commented 7 years ago

Yes this is a well known N+1 problem. And you are correct - executor uses Depth-First algorithm vs Breadh-First (same as reference implementation).

I posted this comment one day about how to workaround this in the user land.

As for solution in core - there is no such thing yet. But I do understand that we need one. The question is which approach will work best (and if I can find enough time to implement it).

One option is to use breadth-first execution as you suggest. But this is really major change and I never found enough time to experiment with it (I actually tracked some other implementations, like .NET which also had plans for breadth-first execution, but never started too).

Other option is to add rather minor modification to support Deferred (Lazy) values which will enable batching.

Then your resolver would look like this:

'resolve' => function($obj) {
    $this->someBuffer[] = $obj->someId;

    return new GraphQL\Deferred(function() use ($obj) {
        $values = DataLoader::loadAll($this->someBuffer); // must load stuff only once obviously
        return isset($values[$obj->someId]) ? $obj->someId : null;
    });
}

Other suggested syntax:

'buffer' => function($obj) {
    $this->someBuffer[] = $obj>someId;
},
'resolve' => function($obj) {
    $values = DataLoader::loadAll($this->someBuffer); // must load stuff only once obviously
    return isset($values[$obj->someId]) ? $obj->someId : null;
}

(see also this thread for related discussion)

This is actually how JS version works (Facebook created separate project called DataLoader to help implementing this approach).

This way of doing things is probably less powerful than BFS, but it is realsitic in terms of finding time to implement at some point %) Plus it is rather easy to understand.

Anyway, if you find it interesting to experiment with breadth-first execution, I am open to ideas and PRs.

rudiedirkx commented 7 years ago

I don't understand how any of these solutions would work. If current implementation really is depth-first, it would have to resolve the first item before evaluating the second, right? So the buffer would never be used. I can't see how this would be possible with depth-first... The core would have to support it, no?

I like the Deferred solution. It gives a lot of power to the implementor (me). But same problem (or I don't get it): if the executor needs the first object before resolving the second, how can it be deferred?

rudiedirkx commented 7 years ago

The linked issue/comment talks about map. Is that a GraphQL language thing, or a server/implementation thing? (I understand it's been reverted?)

vladar commented 7 years ago

Those solutions are just proposed. They are not implemented yet. Obviously they will require changes to Executor code. Actually it will be a mix of depth-first and breadth-first in such case (JS version does similar thing with promises while still logically working as depth-first algorithm).

At this moment the only existing way to workaround this issue is the one described in this comment

As for map. It was my experiment in this library only, it is not a part of GraphQL spec. The idea was that instead of defining resolve for field you defined map and always worked with list of values vs single value as in resolve.

It allowed to optimize some cases (like yours), but it didn't work for many other situations. Which means it was not general enough to maintain, hence reverted.

rudiedirkx commented 7 years ago

I've found another way to userland it. It's also more response size efficient:

query {
    user {
        reservations(start: "2016-10-14", type: "") {   # query to load 30 reservations
            id
            date
            resource {   # is a proxy object with only its id
                save     # doesn't do a load, but saves id
                # name   # this would do a (single) load for its properties
            }
        }
    }
    saved_resources {   # loads the N resources from the 30 reservations
        id
        name
        more
        expensive {
            stuff
        }
    }
}

Since every saved_resource 146 will be the same for every reservation, there's no point in returning {id name more expensive { stuff } } 30 times.

With a little more polishing, it could even be very generic and easy on the server, defining it. It's a little more work for the client implementing the results, but it's fewer bytes to download... Is this a known 'solution'?

rudiedirkx commented 7 years ago

It would be even better, for the client mostly, if GraphQL could return an assoc list, to key saved_resources by id, for easier access. Does GraphQL have a list-type for that? I can imagine keying by id or date can be useful.

vladar commented 7 years ago

I'd say don't do it %) While it might seem tempting, but you place yourself into very dark territory by doing this.

The reason is that a) it makes query dependent of execution order, b) stateful

For example in JS app order of fields execution is not guaranteed (they might run parallel executors for different fields some day) and you may end up with very weird situations which are very hard to debug.

While it is unlikely that we'll have parallel executors in PHP world, but still - good workaround shouldn't force you to rewrite queries.

Have you tried original workaround proposed? While it is also not perfect, but at least you don't have to rewrite queries like this.

vladar commented 7 years ago

Another option is to analyze nested fields in some parent field (e.g. in reservations and just pre-fetch data required for nested fields resolution).

For example:

'resolve' => function($user, $args, $context, ResolveInfo $info) {
  $nestedFields = $info->getFieldSelection(/* to depth */ 3);
  // analyze $nestedFields here to prefetch data
  if (isset($nestedFields['resource'])) {
    // prefetch by user id
  }
  if (isset($nestedFields['resource']['expensive']) {
    // prefetch even deeper
  }
}

Not sure if it will work for your case with interfaces, but still may be useful.

rudiedirkx commented 7 years ago

I haven't tried your workaround yet =) I didn't understand it before, but I think I do now. Just to be clear: remember the list of reservations when it's queried, and when the first resource is queried, I load all relevant resources by the remembered list of reservations. Right? That's doable and could be made generic-ish and readable-ish.

But I like getFieldSelection() even better! I didn't know that. That's 100% exactly what I meant in https://github.com/webonyx/graphql-php/issues/60#issuecomment-255479106

I'll try both soon.

rudiedirkx commented 7 years ago

I already encountered dark territory btw =) It works decently for to-1 relationships like reservation > resource > sport, but it doesn't at all for reservation > players, because players is a list. To work around that messes up the entire beauty of GraphQL syntax. Actually it does for to-1 relationships too... And all the reasons you layed out!

Thanks for all the quick help!

rudiedirkx commented 7 years ago

Ah crap. getFieldSelection() doesn't return players inside ... on CourtReservation. Any way to get at that from outside the (inline-or-not) fragment?

rudiedirkx commented 7 years ago

foldSelectionSet() could include inline fragments. Something like this:

else if ($selectionAST instanceof InlineFragment) {
    $subfields = $this->foldSelectionSet($selectionAST->selectionSet, $descend - 1);
    foreach ($subfields as $name => $subdefinition) {
        $fields[$selectionAST->typeCondition->name->value . ':' . $name] = $subdefinition;
    }
}

but I'm probably missing complex something in $selectionAST->typeCondition->name->value.

That would make $nestedFields:

Array
(
    [resource] => Array
        (
            [id] => 1
            [name] => 1
        )

    [CourtReservation:players] => Array
        (
            [costs] => 1
            [user] => 1
        )

    [ClassReservation:costs] => 1
    [ClassReservation:user] => Array
        (
            [save] => 1
        )

)

and that works for me. This allows eager loading, which makes a big difference in the number of db queries.

Is there a reason foldSelectionSet() doesn't include inline fragments?

vladar commented 7 years ago

I guess we simply missed that when implemented ResolveInfo. Feel free to add a PR for this or I can add it based on your comment some time later.

vladar commented 7 years ago

As for cases like reservation > players - I bet you'll have to run separate queries for such cases in almost any app. There is just no good (and maintainable) way to write this. At least I am not aware of such tools %)

rudiedirkx commented 7 years ago

In my framework, I can eager load all kind of relationships, to-1 and to-many. The active records know where to fetch what for which ids:

$reservations = someLogic();
// to-1
$resources = ARO::eagerOne($reservations, 'resource'); // Loads and attaches all 'resource' objects: $reservations[0]->resource
$sports = ARO::eagerOne($resources, 'sport'); // And even the next level
// to-many
$players = ARO::eagerMany($reservations, 'players'); // Loads and attaches all 'player' objects: $reservations[0]->players[0]

If I know what to expect (with getFieldSelection), I can use that exact logic to eager load relationships, without any change in existing code. Schema could contain (custom) meta data to auto-eager-load relationships if they're in nestedFields.

We're getting somewhere! Thanks. I'm still trying your original workaround suggestion some time soon.

lordthorzonus commented 7 years ago

@vladar @rudiedirkx I tried yesterday to quickly implement dataloader in PHP with ReactPHP and it works quite nicely : https://github.com/lordthorzonus/php-dataloader (Still quite much a work in progress though ).

I'm interested in trying it out with the package. Basically I would need to edit the Executors executeFields and execute to support to wait all the field resolving stuff and return a promise if I understood the source right..

Then it should be just this at the endpoint :

$response = null;
$eventLoop // get this from somewhere (the same event loop should be provided for the dataprovider).
GraphQL::execute(...)->then(function($result) use (&$response) {
  $response = $result;
});

$eventLoop->run();

echo json_encode($response);

I think I'll have time to play around with this later this week :).

vladar commented 7 years ago

Nice. Feel free to send a PR if you make it work.

Also note that Executor follows closely JS version which already supports promises, so it might make sense to simply follow their solution.

The problem though is how to test this feature with PHPUnit / Travis CI.

ivome commented 7 years ago

@lordthorzonus I like how they did it in the python graphql-core package: You can pass an executor to the execute function that handles the event loop. That would also make it possible to further customize the behavior, add logging, profiling of resolvers, custom exception handling etc.

https://github.com/graphql-python/graphql-core

rudiedirkx commented 7 years ago

Promises in PHP. Makes no sense to me at all.

If #61 gets in, or something like it, the server implemenation can eager load most efficiently with existing architecture. Works for any number of levels down.

Some kind of data loader would be nice too, but faking promises in PHP seems silly.

rudiedirkx commented 7 years ago

So many different opinions and best/good/decent practices! Speed is most important to me. The more awesome and more options, the slower, probably.

chrissm79 commented 7 years ago

@vladar I'm starting to work on a solution using the pre-fetch idea you pitched here and using a central repository to store/retrieve the data in the resolve functions.

However, using ResolveInfo's getFieldSelection helper only provides the name of the fields (which is still very helpful), but they don't include the arguments. So if I have a connection field that needs to just retrieve the first 5 items, I would need to resolve that information from the arguments. Is there a method you recommend to accomplish this with this package? Thanks!!!

vladar commented 7 years ago

@chrissm79 Full-featured support for such things as arguments and directives actually means that you need to visit related ASTs %) It makes little sense to introduce new structures that will basically duplicate existing AST.

So the simple answer - just use ResolveInfo to walk through field AST and get the information you need (maybe using getFieldSelection as a reference for your implementation).

Also note that getFieldSelection will be likely deprecated exactly because real-world look-ahead solutions are complicated (see related discussion in https://github.com/webonyx/graphql-php/pull/61). In short, fragment spreads can by typed and currectly this function doesn't account for that (as well as for inline fragments).

vladar commented 7 years ago

@chrissm79 After some thinking I realized that I may be wrong and convenient look-ahead tool in the core could be useful (at least it is requested often). So I created #65 to discuss possible implementation. Feel free to post your idea / use-cases there.

Yet I doubt that I'll find time for this anytime soon, so any help would be appreciated.

vladar commented 7 years ago

Closing this in favor of #65 and #66

chrissm79 commented 7 years ago

@vladar Really excited about the possibilities here! Adjusting the getFieldSelection to incorporate arguments ended up being pretty painless, I just needed to grab the arguments off the field and include them in the output:

/**
 * Prefetch data.
 *
 * @param  ResolveInfo $info
 * @param  int $depth
 * @return void
 */
public function prefetch(ResolveInfo $info, $depth = 1)
{
    $fields = [];

    /** @var Field $fieldAST */
    foreach ($info->fieldASTs as $fieldAST) {
        $fields = array_merge_recursive($fields, $this->foldSelectionSet($fieldAST->selectionSet, $depth));
    }

    return $fields;
}

/**
 * Fold field selection set.
 *
 * @param  SelectionSet $selectionSet
 * @param  int          $descend
 * @return array
 */
private function foldSelectionSet(SelectionSet $selectionSet, $descend)
{
    $fields = [];

    foreach ($selectionSet->selections as $selectionAST) {
        if ($selectionAST instanceof Field) {
            $fields[$selectionAST->name->value] = $descend > 0 && !empty($selectionAST->selectionSet)
                ? $this->buildOutput($selectionAST, ['children' => $this->foldSelectionSet($selectionAST->selectionSet, $descend - 1)])
                : $this->buildOutput($selectionAST);
        } else if ($selectionAST instanceof FragmentSpread) {
            $spreadName = $selectionAST->name->value;
            if (isset($this->fragments[$spreadName])) {
                /** @var FragmentDefinition $fragment */
                $fragment = $this->fragments[$spreadName];
                $fields += $this->foldSelectionSet($fragment->selectionSet, $descend);
            }
        }
    }

    return $fields;
}

/**
 * Build field output.
 *
 * @param  Field  $field
 * @param  array $data
 * @return array
 */
private function buildOutput(Field $field, $data = [])
{
    $args = [];

    foreach ($field->arguments as $argument) {
        $args[$argument->name->value] = $argument->value->value;
    }

    return array_merge([
        'parent' => isset($data['children']),
        'args' => $args,
    ], $data);
}

webonyx / graphql-php

Horizontal eval for lazy loading same-type objects #60