graph-gophers / graphql-go

GraphQL server with a focus on ease of use
BSD 2-Clause "Simplified" License
4.64k stars 491 forks source link

No way to get requested fields inside resolver #17

Open blasterpistol opened 7 years ago

blasterpistol commented 7 years ago

Problem: Inefficient microservice/database queries. If I need to return 100 orders with their clients, I have to make 101 requests to microservices (1 for batch of orders, 100 for their clients (one per each order)).

Scheme (simplified):

type Query {
  orders: [Order]
}

type Order {
  id: ID
  client: Client
}

type Client {
  id: ID
  name: String
}

Query:

query GetOrders {
  orders {
    client {
      name
    }
  }
}

Resolvers:

func (r *Resolver) Orders() []*orderResolver {
    data, err := Request(orderService, "Order.List", &Args{}) // returns batch
    ...
    return // batch of 100 order resolvers
}

// 100 client resolvers - 100 requests
func (r *orderResolver) Client() *clientResolver {
    data, err := Request(clientService, "Client.Get",&Args{) // returns one 
    ...
    return // a client resolver
}

Instead, I'd like to know (in the order resolver) that the field with clients has been requested. So I could make one batch request at the order resolver level.

Something like this:

func (r *Resolver) Orders(ctx context.Context) []*orderResolver {
    data, err := Request(orderService, "Order.List", &Args{}) // returns batch
    if ctx.Value("fields")... {
          clients, err := Request(clientService, "Client.List",&Args{) // batch
        }
    ...
    return // batch of 100 order resolvers witch clients inside
}

So I get 2 request.

neelance commented 7 years ago

This is definitely a feature we should think about. Is there anything like that in the graphql-js implementation?

I am a bit worried that this can cause people to write bad resolvers. One nice attribute of GraphQL are all those consistency guarantees it provides.

nicksrandall commented 7 years ago

I agree that this could cause people to write bad code resolvers, but it could also help write efficient resolvers. For example, one could use that information to prevent an unnecessary "join" in his/her db query if the user didn't ask for any of those fields. Also, it could give user the ability to log which fields are being requested most frequently and then perform optimizations on those fields (like pulling from redis-cache rather than db).

The last variable passed to graphql resolvers in graphql-js has lots of info about the schema, fields, ect...

see: https://github.com/graphql/graphql-js/blob/master/src/type/definition.js#L489

neelance commented 7 years ago

@nicksrandall Do you know any example code that uses this feature with graphql-js?

nicksrandall commented 7 years ago

@DenisNeustroev Facebook has a recommendation on how to solve the n+1 problem with a technique called DataLoader. I have taken a stab at implementing this technique in go here: https://github.com/nicksrandall/dataloader.

@neelance I am not aware of any examples of people using the schema info in a resolver but I haven't done much research on the subject.

F21 commented 7 years ago

Being able to see what fields were requested would make it super easy to select a specific list of columns when performing an SQL query in a resolver. That means, we don't have to get all fields (SELECT * FROM) which could be more efficient.

mvpmvh commented 7 years ago

I know context can be a divisive topic in go, but reading the graphql spec had me wondering: if a resolver gets 3 args (i.e. obj, args, context), could a server implementation stuff the selection set in there? Off the top of my head I was thinking if the context knew the selection set and knew what in the selection set had already been resolved, it could save on extra requests

tj commented 7 years ago

I have a similar issue, need a way to either defer the resolution so I can produce a query, or some way to access the nested fields of the query to properly form the query instead of simply filtering results a very expensive query (or possibly unknown, in which case it's impossible). Anyone come up with a workaround?

EDIT: I ended up hacking it in. I think ideally there could be an optional preparation pass to the set of resolvers, or some similar set of structs, letting you construct the queries necessary first. Some variant of that haha.

nicksrandall commented 7 years ago

@tj I took at stab at implementing this in #70 . Would you mind sharing what you did as a work around?

neelance commented 7 years ago

I'm still not a fan of exposing all this data to the upper resolvers. They should not care about that.

However, the initial example by @DenisNeustroev definitely shows a need. What about instead of integrating that clients, err := Request(clientService, "Client.List",&Args{...}) into the Orders resolver, we instead add exactly that, a way to process a batch of Client resolvers in one go?

nicksrandall commented 7 years ago

@neelance what do you mean by "upper resolvers"? I think it is reasonable for a resolver to be aware of it's children and thus aware of the fields requested for itself and it's children.

Also, I like the idea of baking in the ability to execute a batch of resolvers in one go. I've been doing a form of that with my dataloader implementation and it has been very helpful for the performance of my service.

tj commented 7 years ago

It's nearly impossible to optimize a query without that information. For example my current use-case is to fetch a series of rows from Postgres, then I have to construct a query for Elasticsearch based on those, however I have to know what metrics the user is asking for, otherwise it's simply impossible to create the query.

checks() {
  name
  url
  metrics {
    time_total {
      min
      max
      avg
    }

    errors {
      sum
    }
  }
}

I'm new to GQL so maybe this is a weird approach, you could maybe do it by passing arguments but it would be super ugly to the point where I wouldn't want to use GQL for it anyway.

If this is an anti-pattern definitely let me know haha, but it seems logical to me. The main thing I like about GQL is this expressiveness vs passing metrics(check_id: ID, select: ['min', 'max']) or whatever that would look like, which also wouldn't work in my case due to nesting.

neelance commented 7 years ago

@tj How much performance difference does it make in your case to fetch some fields vs. all fields? If all fields come from the same table (no JOIN), then the difference might be so small that it would be premature optimization.

tj commented 7 years ago

In my case it would be hugely detrimental, I'd have to fetch every possible permutation of what they can ask for and then filter. They're pretty expensive aggregate queries on large amounts of data as well so it's not like a quick select from Postgres, more like going from 200ms queries to 5s+.

For the SQL use-case I totally agree that selecting specific columns is usually not a huge deal, I almost always select * those anyway.

neelance commented 7 years ago

All right. For that situation you'll probably want to use a custom resolver. That feature will be added soon, I'm just currently busy with adding validations.

tj commented 7 years ago

Cool cool no worries! The context thing will keep me covered for now, just passing the fields to a template to generate the massive Elasticsearch query haha.

tj commented 7 years ago

Ahh hitting cases where custom resolver would be the only way to go now. Doing something like this to provide a histogram(field: ..., interval: ...):

"{{or .Alias .Name}}": {
  "histogram": {
    "field": "result.{{.Args.stat | normalize_stat}}",
    "interval": {{.Args.interval}}
  }
}

which can't be properly mapped unless you have access to the name or alias anyway.

xmirya commented 6 years ago

What about something like https://github.com/neelance/graphql-go/pull/169 ?

abradley2 commented 6 years ago

Having an issue with this as well, I'm eating the performance hit fine in my case though. The bigger issue is not being able to get query arguments in resolvers nested all the way down. I'm looking at the solution here for selectedFields (https://github.com/DealTap/graphql-go/pull/5) and hacking that in to include arguments as well

vladimiroff commented 6 years ago

Now that we have two alternative solutions for this issue I decided to give my two cents about it.

  1. Adding one more optional parameters to resolver methods (#169). In general go doesn't have optional parameters (excluding the case with final variadic parameter), but since this package dynamically resolves methods by name the idea with optional parameters already works great (the first optional context.Context one). The main argument against seems to be "we can't just keep on adding optional parameters for everything we think of". This makes a lot of sense, however could anybody think of something else fundamental that could require an optional parameter? If there are really several other feature requests that makes sense to get fixed with yet another optional parameter, then let's really avoid this solution.

  2. Introducing a function which extracts selected field from the context (DealTap#5). To be fair this is one of the few good reason to use context values for as stated in package's docs:

Use context Values only for request-scoped data that transits processes and APIs, not for passing optional parameters to functions.

However, I argue that this is a good idea giving that Go doesn't have an idiomatic way to pass optional parameters unless dynamic method dispatch is in place. Think of what the user's code will look like? In case 1 the optional parameter is added only where needed with already populated and ready-to-use value. Nothing else get changed. In case 2 instead there will be one repetitive block with something like:

selectedFields, err := selected.GetFieldsFromContext(ctx)
if err != nil {
    // for some reason the context value is missing. We know that one can't
    // send a query with zero selected fields, then we must assume all
    // fields are selected... I guess.
    selectedFields = somehowGetAllFieldsForThisQueryOrHardCodeThem()
}

// Here selectedFields is just what would've been anyways with an optional
// parameter.

The unfortunate part is that there's really no reliable and safe way to make selected.GetFieldsFromContext deduce all possible fields for this query and simply return them without ever giving an error.


TL;DR: I think the solution proposed in #169 is the right way to resolve this issue and should go towards getting it ready to be merged and eventually merging it.

matiasanaya commented 5 years ago

@DenisNeustroev since in your case the upstream resolver (i.e. Orders) knows how to construct the batch query for the downstream resolver (i.e. Client) you can still get only two requests (at most) with the current implementation. See the following pseudo-code:

func newClientGetter(ids []string) func(id string) (*client, error) {
    var once sync.Once
    var data interface{}
    var err error

    // Lazy loads the batch when called
    return func(id string) (*client, error) {
                 // will only ever fire one request
        once.Do(func() {
            // batch
            data, err = Request(clientService, "Client.List", &Args{})
        })
        if err != nil {
            return nil, err
        }

                 // clientFromData(interface{}, string) (*client, error)
        return clientFromData(data, id)
    }
}

func (r *Resolver) Orders() []*orderResolver {
    data, err := Request(orderService, "Order.List", &Args{}) // returns batch

    // Assuming you can get the client IDs from data
        // idsFromData(interface{}) []string
    clientGetter := newClientGetter(idsFromData(data))

         ...

    return // batch of 100 order resolvers witch clientGetter inside
}

// 100 client resolvers - 1 request
func (r *orderResolver) Client() *clientResolver {
    data, err := r.clientGetter(r.clientID)

         ...

    return // a client resolver
}

If N > 0 clients are needed only one Request(clientService, "Client.List", &Args{}) will be fired. If 0 clients are needed no Request(clientService, "Client.List", &Args{}) is ever fired.

I've had success with this approach every time there's a nested batch query where an upstream resolver has enough info to pre-build the query. Hopefully this can help you or @abradley2.

Thoughts?

gbaptista commented 5 years ago

I've found a way around this without having to change the lib itself by creating another lib just for this purpose: gbaptista/requested-fields

I don't like the idea of having to parse the query a second time or need to manually set the hierarchy of resolvers, but it was a simple and quick solution that works.

mgwidmann commented 5 years ago

A use case that is completely missed here is the difference between doing SELECT * and SELECT some_field, another_field can be realized when a row has a particularly heavy blob attached to it. This reason alone can make a dramatic performance improvement.

Additionally, this is common enough to use this information in both the Javascript implementation as well as the Ruby implementation (I've done it myself). With other implementations, they simply provide the AST to the resolver from the point in which the graph is currently being walked. It basically came for free in the other implementations.

The ruby implementation even goes as far as allowing plugins to be able to make changes to the AST by using a copy-and-modify technique to keep immutability of the AST. This allows them to implement custom directives as a form of AST manipulation. https://github.com/rmosolgo/graphql-ruby/blob/master/guides/language_tools/visitor.md

rodrigorodriguescosta commented 5 years ago

I need to decide what to do within resolver based on fields requested by client, how to get the fields list requested by user? why not created one struct with all parameter related to request, including parents fields and inject as parameter on resolver? any update about this? I'd like to use this lib but seems is not maturity enough for production use which reach real situations, is it?

jeanpi commented 4 years ago

I started using this lib a few months ago, but now we're running into this limitation. It's unfortunate. I think I'm going to have to rip it out, which is going to be a bunch of work.

Jmoore1127 commented 4 years ago

We need this too. We are looking to do field usage reporting and metrics in addition to tracing. We'd prefer not to spread the monitoring logic throughout the application. There doesn't seem to be a feasible way to do this currently.

keithmattix commented 4 years ago

I've had #373 open for a few months as a possible solution to this issue. Would someone mind taking a look?

jhelberg commented 3 years ago

373 seems a very non-intrusive implementation and not really risky. The problem to be solved is pretty urgent though. Every time I read that one of the great things about graphql is that you only get what you ask for, I die a little, as the problem is still there at api-implementation which doesn't help at all.

Can I do something to get a solution in? I've looked at #373. Is this the one I should be checking in production and give an OK (or not) on?

andrewmostello commented 3 years ago

Additional vote here for #373. Using the context argument makes it backwards compatible and opt-in, so I think this is the best approach of the ones I've seen submitted.

keithmattix commented 3 years ago

I've closed #373 in favor of #428 which is off of my fork's master branch. Hopefully, if there's any other work maintainers would like to see first, please let me know :)

JohnStarich commented 2 years ago

Hi all. Where do we stand on this today?

I love the way this library works, but I have use cases that necessitate an "eject" button to resolve selected fields from another GraphQL instance (different shard). (My case closely matches this comment.)

From what I've found, these are all the proposals and their statuses:

Edit: I thought I should add: This missing feature is looking like a deal-breaker for my team. Without this flexibility, we hit issues creating a global GraphQL service with multiple, regional database shards. A dataloader helps, but even 2 serial round-trip requests to another region is quite expensive time-wise.

JohnStarich commented 2 years ago

It seems 169 is the furthest along in discussion. I think the idea shows the most promise as well. One way we could support future growth here is to make the new parameter a kind of ResolveParams struct, where new fields can be added later without breaking changes. This could resemble the approach used here: https://pkg.go.dev/github.com/graphql-go/graphql#ResolveParams

Perhaps we start with the selected fields today behind a method on the struct, so it may be lazily constructed. Something like this?

func (o *MyObject) MyField(ctx context.Context, args struct { ID graphql.ID }, info graphql.ResolveParams)

type ResolveParams struct {}

func (p *ResolveParams) SelectedNode() graphql.Type

// alternatively:
type SelectedField struct {
    Name string
    Fields []SelectedField
}
func (p *ResolveParams) SelectedFields() []SelectedField
brian-pickens commented 2 years ago

Unfortunate. I was hoping I could project the selected fields into my mongodb queries. It sounds like discussion on this has stalled since October last year?

pavelnikolov commented 2 years ago

Hi @brian-pickens, The discussion has started a few years ago and is one of the most requested features since the creation of this library. I am working on the feature and I have a working branch. I haven't come up with the final API that I am happy to release. I haven't released it yet because once released people would use it and changing it would be impossible without any braking changes. I am planning to release it soon.

CreatCodeBuild commented 2 years ago

Is there any progress on this? My use case is that I need to implement directives.

func resolver() T {
  var t T = get()
  return t
}

where T is a huge struct that maps to a GraphQL type.

Some fields of this type has directives attached for access control. For example

type T {
  a: Int
  b: Int  @need(role: "admin")
}

Then in the go code, I want to set b to zero value if the requesting user is not an admin. There is no way for me to know that b is queried, unless I implement resolver methods for all fields of T.

But because T is a huge struct with hundreds of fields and nested structs. It will become super verbose to translate all fields to methods.

We need something similar to info in graphql.js

CreatCodeBuild commented 2 years ago

@pavelnikolov Do you have a draft PR?

pavelnikolov commented 1 year ago

@CreatCodeBuild you might be interested in PR #543

camdencheek commented 9 months ago

Hi @pavelnikolov! You mentioned above that you have a working branch with an in-progress implementation of this feature. Any chance there's been progress on an API? I'd be happy to contribute to this feature in any way that would be useful.

Honestly, even just making the branch public would be great. Even if it means I have to integrate a breaking change in the future, I'd rather have something that's close-ish to what gets released. In any case, it'd likely be a smaller breaking change than the fork I'll likely make otherwise.

bmhatfield commented 1 month ago

@pavelnikolov I've been perusing the current state of the various issues, etc, and I was wondering if you'd be willing to share the branch you have? I already run a fork (which I keep up-to-date with the main project), so I am interested in experimenting with your changes.

I know this is a broadly requested feature with endless pestering, but I haven't seen any commentary in 2024 about this so I thought I'd shoot my shot ❤️