youshido-php / GraphQL

Pure PHP realization of GraphQL protocol
MIT License
710 stars 106 forks source link

handle deferred resolvers #102

Closed roippi closed 7 years ago

roippi commented 7 years ago

This landed over in webonyx: https://github.com/webonyx/graphql-php/blob/c7688c92499fa2009a235cf2498f5582204ff7bf/docs/data-fetching.md#solving-n1-problem

I think this is a really elegant solution. I was looking into strategies for doing this type of thing and was looking at some sort of analyze-query-via-lookahead optimizations, but this seems much cleaner (and easier to implement).

roippi commented 7 years ago

Happy to take a crack at this if we reach consensus on design.

tpetry commented 7 years ago

Wouldn't using depth-first resolving solve the problem easily by adding something like a resolveMultiple function be a very simple solution?

But the n+1 problem has really to be solved for any realistic production usage.

viniychuk commented 7 years ago

@roippi let's talk in gitter when you can – I don't think we even have this problem to solve.. If I request

{
   post {
     id
     title
     comments {
       id
       title
    }
}

then it's up to my data source to resolve comments in one turn. So I'd have a resolver on the PostField which would return me comments for each post and I wouldn't have a resolver on the comments field at all. So @tpetry idea of resolveMultiple is automatically executed here. Maybe I'm missing important thing here, but so far I don't have this problem on our projects even though they are fairly complex. Maybe you guys want to share some concrete example and I'll provide my implementation that we can discuss.

danydev commented 7 years ago

@viniychuk so, let's say you have Post and Author, every Post has an Author. At some point you need to retrieve Post(s) along with their Author back from GraphQL for listing, how could you write the resolver so that you don't need to hit the database N times to retrieve all the Author(s)?

viniychuk commented 7 years ago

@danydev I feel like we're talking about something different here.. but still. I'll have resolver for the postsField that would get me Posts and their Authors (to simplify, let's say they are simply left joined to it) and then when processor will go over author field for each post it will use the same data from the posts resolver. I'll write an example right now. I should be simple enough to keep it all in one file (I'll substitute DB calls with some fake service).

viniychuk commented 7 years ago

@danydev Here's a quick example.. I'm still under the impression I didn't understand the original question... or wasn't clear enough with the face that you don't have to define resolvers on every field if your higher level resolver can get the data. https://github.com/Youshido/GraphQL/blob/686b710f5e9c3901ceaf52ce83d8bfaaa7427af4/Tests/Performance/NPlusOneTest.php

roippi commented 7 years ago

GraphQL is about letting the client ask for whatever it wants, right? So we're more handling situations like this:

{
  user1: user(id: 1) {
    posts {
      comments {
        id
      }
    }
  }
  user2: user(id: 2) {
   ... sameAsUser1Fragment
  }
  post (id: 1234) {
    comments {
      title
    }
  }
  user3: user(id:3) {
    friends (first: 5) {
      posts {
        comments {
          id
          title
        }
      }
    }
  }
}

How would you optimize this to load all the posts all at once with a single db hit? The query is asking for it two or three different ways. For me, the answer is deferred resolvers:

'resolve' => function(...$args) {
    $ids = Post::extractIdsFrom($args);
    Posts::add($ids);

    return new Deferred(function() use ($ids) {
        Posts::loadAllTheThingsFromTheDatabaseOnce();
        return Posts::get($ids);
    });
}

This doesn't even get into handling some recursive data fetching problems. This is a real life problem I have to solve with an arbitrarily nested category structure.

category (id: 1234) {
  subcategories {
    id
    title
    subcategories {
      id
      title 
      subcategories {
        id
        title
        # I could keep going
      }
    }
  }
}

Eager loading (like you do in your example with authors) is nice when you can do it, but sometimes you can't. So the options are either:

viniychuk commented 7 years ago

Everything you're saying does make sense. On the one hand – I believe that proxying and promises are a bit out of scope of the library and should have different implementation utilizing whatever frameworks people are using in their projects.

On the other one – I'd be eager to solve it in vanilla php, but I'll need to hear more input. Deferred resolvers are the way to go, but can you share in pseudo code idea of implementation of your method:

Posts::loadAllTheThingsFromTheDatabaseOnce();

The reason I'm asking is because we have different criteria for posts. In the first field it would be posts with user_id=1, in the 2nd –  posts with user_id=2, then post with id=1234 and then, finally, posts with user_id IN (select ids of friends of user with id 3 limit 5). Having a bit more complex logic would make it impossible to make a single query to DB and without being able to do so there's not much benefit in having this approach at all.

I'm all in to solve this, but I'll need your help in finding an abstract enough solution that would be still useful considering all possible frameworks and libraries people are using this with.

p.s. In our company we mostly use symfony and its services (which we call resolvers). Those services take responsibility of dealing with data in the most efficient way (e.g. your recursive query would be handled by a service that have flat array of data and handle requests).

p.p.s. While I was writing all this I think I figured how to make the first draft. Will work on it and share in a separate branch. If you have any additional thoughts - let me know.

roippi commented 7 years ago

Cool - I think the promises bit isn't really important, we don't need a third-party dep there. Really all I'm thinking is

$next = $query;
do {
   processQueryAst($next);
} while ($next = popDeferredResolver())

(pseudocode obviously)

Basically in my mind Deferred is just a marker for a resolver that the processor will return to after it's done walking the AST of the non-deferred resolvers. Do/while bit is just because a deferred resolver could itself enqueue one or more deferred resolvers.

Posts::loadAllTheThingsFromTheDatabaseOnce();

This was a contrived example (sorry) but if I were e.g. using redis for a backend I would MGET all of the user ids, or a relational DB I would do some SELECT ... WHERE post.id IN (...) type of jazz. I'm in the process of gluing a number of different technologies together behind graphql including redis, sql server, SOLR, some ad hoc REST services... so you can imagine I need something pretty flexible :)

fritz-gerneth commented 7 years ago

I'm quite in favor for this feature for the same reason @roippi mentioned - independant sub-queries might query for the same data source. I don't think that the lib itself would need any sophisticated Deferred implementation for this. I guess it already would be quite useful to return a closure (or any other callable) from the resolver for later resolution. This certainly would push all responsibility to the user.

A trivial example then would look like this:

public function resolve(...$args) 
{
    $myDataSource->enqueue($args);

    return function () use ($args) {
        // Leave it up to $myDataSource on how to load all enqueued queries at once and return ths ones for this field
        return $myDataSource->fetch($args); 
    };
}

Soley relying on is_callable types has the benefit that everyone can use the framework and style of their choice.

This could be further streamlined by passing the same arguments the resolve function receives to the callback:

public function resolve(...$args) 
{
    $myDataSource->enqueue($args);

    return function (...$args) {
        // Leave it up to $myDataSource on how to load all enqueued queries at once and return ths ones for this field
        return $myDataSource->fetch($args); 
    };
}

It is even possible to return an object implementing __invoke. This in combination with passing resolve the arguments to the callback would allow the resolve to simply return an object. What I kind of would envision on this could look like this:

class MyDeferedDataSource
{
    public function enqueu(...$args) 
    {
        // Enqueue request for later.. 
        return $this;
    }

    public function __invoke(...$args)
    {
        // Load if not loaded
        // Do the actual resolving
        return $result; // Could be a callback again for defered 
    }
}

// In the field-class
public function resolve(...$args) 
{
    return $myDataSource->enqueue($args);
}

The great thing about this would be that it is fully transparent to the field class itself how the data source wants to handle this / can support this. I could easily replace $myDataSource to return the result right away - without having to change the GrapQL field implementation.

akomm commented 7 years ago

@viniychuk you have to fiddle in postField using ResolveInfo to decide, whether to join-select comments or not. This is quite easy on a +1 depth scale, but gets complicated when you allow to query +n depth - which is the point of a "graph" being able to do so. It is about how you can implement resolvers so that they only see their own responsible area (self plus 1 depth for its directly related "children") but not further. I know that data from parent resolver is - when needed - passed to next resolver, but by default that leads to 1+n OR the top most resolver having to care about the whole "possible" graph selection ahead of it.

I did not test the defer approach, but from first impression it seems like it can be chained so that a resolver only has to be aware of itself, not n depth ahead. I'm going to test it.

fubhy commented 7 years ago

@viniychuk Have you had a chance to work on this? If not, do you need help? Can I assist in any way?

viniychuk commented 7 years ago

@fubhy I would actually use a hand on this... it would be helpful if someone could take a crack on the example of the schema structure with deferred resolvers so I would make it work. I tried to do some work on it but because of all the related code and validation it started to make less and less sense.. probably because of the way I use it with services and DI. Again, I get the idea, it sounds great, but when I actually try to implement it – it doesn't do me any good. The only way I see it now is to create an "DeferredReolverInterface" and make people implement it so I can be sure it's something that can guarantee me resolving at some point... I pushed it into the deferred brunch.

Also @roippi.. after closer look at your subcategories example I found it quite challenging to survive, because you don't really know about much about parents inside the deferred resolver before parent data is obtained from the DB..

Anyway guys. I'm in the "let's do this" mood, but I need some outside view on this...

Lewiatan commented 7 years ago

Hi, I've found deferred branch and took it on some tests. It looks like it almost works, deferred are executed correctly, but I've found issue, that when you defer resolving of field it does not include its dependencies. For instance, when I want to load many Users with their Posts and you want to pull Comments for those posts, when Posts are resolved with deferred it do not pull in Comments.

Is this branch still in developement? Do you need any help? Maybe you could explain what was your idea and what have you done so far so I could try to work it out?

viniychuk commented 7 years ago

@Lewiatan could you please make a small test for your case, I'm ready to push all the updates and just want to run it through your scenario as well.

pmelab commented 7 years ago

@viniychuk I created PR #163 that properly evaluates nested deferred resolvers. I added a test case that hopefully illustrates the scenarios (at least the ones we need for the GraphQL Drupal Module). Let me know what you think.

fubhy commented 7 years ago

This can be closed I guess?

viniychuk commented 7 years ago

@fubhy yes.