Closed cody1024d closed 6 years ago
Great question! Dataloader is a fairly simple utility once you grok it, but it's dense topic. I'll ensure to explain fully, soon.
but I guess I'm not seeing where the actual data-layer access is happening? Does that happen in each of the batch functions?
Yup! The dataloader instances that live on the request context collect requests for data (batches), then on some interval (I think 16ms by default), executes the batch function. The result of execution is then available to anyone who has asked for data or will ask for the same data for the duration of the request.
Thanks @tonyghita, I appreciate the response. My last questions on the subject, I promise! I'm (slowly but surely) getting my head around the topic, I think!
Looking through the code, I'm seeing Resolvers call the NewX functions on the loaders, which in turn call to the loaders (which in turn call to the batch functions on that interval).
The one piece of the puzzle that looks out of place are the Prime functions in the loader files. Looking at Facebook's DataLoader documentation, the prime functions are there to essentially seed the cache with planets. I see the Resolver calling it (for example, from NewPlanets), but am unsure of why this is necessary? Is this essentially taking the results from a List-Resolver call, and making them available in case of a Singular-Resolver call (So for example a list of Planets is needed, and then somewhere else in the graphQL query, a singular planet is requested, that was already fetched through that list)?
And also, seemingly, you have a bit of abstraction occurring through the planetGetter and planetLoader structs?
Looking through the code, I'm seeing Resolvers call the NewX functions on the loaders, which in turn call to the loaders (which in turn call to the batch functions on that interval).
Yup, the New*()
functions are used to validate any inputs and load data needed to resolve the type.
I see the Resolver calling [the loader
Prime*()
function] (for example, from NewPlanets), but am unsure of why this is necessary?
It's not strictly necessary, but I think it's a good practice to provide the data on the cache if you have it, just in case it's asked for again somewhere else in the query. The goal is to do expensive work (i.e. a network request) once in any given request.
This becomes more powerful when clients batch queries together (another layer of batching!) in a single HTTP request. Since the loaders have a cache that is shared throughout the lifecycle of the HTTP request, each query in the batch can benefit from that one cache.
And also, seemingly, you have a bit of abstraction occurring through the planetGetter and planetLoader structs?
Yeah, the thought it that this will make it easier to mock service call in the loader unit tests, since we only have to implement a single mock function, rather than every method the swapi
client provides.
The benefit of this is each loader is restricted to knowing only the minimum it needs in order to load its own data.
An alternative implementation would pass the client instance to each loader instead of the interface.
Good questions!
Oh man, it finally just clicked @tonyghita. So your data-layer access actually happens through the getter interface, which happens (for example inside of the batch call in the planet loader file) here (by going through the planetLoader struct, to something* that implements the planetGetter interface:
for i, url := range urls {
go func(i int, url dataloader.Key) {
defer wg.Done()
data, err := ldr.get.Planet(ctx, url.String())
results[i] = &dataloader.Result{Data: data, Error: err}
}(i, url)
}
Am I on track with that? If so, is there an example of the planetGetter anywhere in this project, or that's left for implementation (obviously dependent on what persistence storage you're using). I just want to see the example end-to-end, for my own sake.
EDIT: Ahah! Found it, inside the swapi files:
func (c *Client) Planet(ctx context.Context, url string) (Planet, error) {
// TODO: implement
return Planet{}, nil
}
From an architecture perspective that totally makes sense, I think I was just battling the new syntax/semantics that I'm not used to, so I didn't grok it at first
@tonyghita I really appreciate you taking the time to make this project, and help me through it!
@tonyghita Ok, another question, although this isn't one directly covered by this example. Have you given any thought on how you would architect an interface/implementing-class relationship with resolvers and loaders and all?
Something like the Character->Human/Droid relationship in the golang example, or the classic Animal->Dog/Cat. Looking at the graphql-go (formerly from neelance), there's a resolver method:
func (r *Resolver) Character(args struct{ ID graphql.ID }) *characterResolver {
if h := humanData[args.ID]; h != nil {
return &characterResolver{&humanResolver{h}}
}
if d := droidData[args.ID]; d != nil {
return &characterResolver{&droidResolver{d}}
}
return nil
}
This method, without the in-memory maps, I think, needs to determine the type of the character (calling a method in the loader package to do this, maybe?), and then calls an additional function, based on what type of character it is. My only concern with this, though, is I'm unsure of how to successfully batch the initial type-check call.
I hope I'm explaining my question well enough, let me know if it at all doesn't make sense. Thanks again for all the help, and if you're ever in the Dallas area, I owe you a beer at this point :)
This really depends on your backend implementation, but it sounds like you'll need:
Yeah that's kind of the workflow I had in mind. I guess I'm confused on how to leverage the DataLoader framework to make that first step happen, though. Would it simply be creating a new batch function, associated with a loader for, let's say, CharacterType, for the sake of the example? I guess what's got me is that its technically an attribute on an object, as opposed to an object itself.
Unless this all would happen in the loader functionality for the interface, so that it gets batched properly? Hmmmm
As of now I have the CharacterResolver function above calling to check the type, and then going through the right sub-type's loader , it seems like a deal breaker, almost as it would put me back into the scenario of querying for every object, if there's no way to batch this type check query
Note: The more I think about it, in my particular case, this is moot, as I'm going to just use a NOSQL db, and will store all Characters in the same table, and thus don't need separate loading methods, and all, and can just always pull from the same table, and then unmarshal (or marshal? forget the right nomenclature) into the correct type based on a type attribute on the Characters. Although if I were going with a relational DB, I think the above is still an interesting issue
Also @tonyghita sorry for the multiple questions, but on a second look at the batch function in this example, correct me if I'm wrong, but you're not actually "batching" the fetch of the objects? You're looping through the keys and fetching each one.
For your use-case, I'm assuming because that's what the swapi supports? However, for example, hitting a DB, looping over the keys inside the batch function is counter-productive to the problem DataLoader is trying to solve?
(The below is an example of the load batch that I'm talking about)
func (ldr StarshipLoader) loadBatch(ctx context.Context, urls dataloader.Keys) []*dataloader.Result {
var (
n = len(urls)
results = make([]*dataloader.Result, n)
wg sync.WaitGroup
)
wg.Add(n)
for i, url := range urls {
go func(i int, url dataloader.Key) {
defer wg.Done()
data, err := ldr.get.Starship(ctx, url.String())
results[i] = &dataloader.Result{Data: data, Error: err}
}(i, url)
}
wg.Wait()
return results
}
I know this thread is from several months ago but since @tonyghita seems pretty busy and there is no doc about dataloaders —which are quite hard to understand if you are new to GraphQL—. I think this could be helpful for newbies. This is how I got to understand the importance of dataloader.Prime() as well as how are dataloader connected to graphql. So, if anyone is struggling to understand the dataloaders and Prime(): read this.
Things I logged:
Results were very enlightening for me when I was struggling to understand how this graphql-dataloader-golang thing work. I tried removing PrimePeople from NewPeople():
func NewPeople(ctx context.Context, args NewPeopleArgs) (*[]*PersonResolver, error) {
//err := loader.PrimePeople(ctx, args.Page)
//if err != nil {
// return nil, err
//}
results, err := loader.LoadPeople(ctx, append(args.URLs, args.Page.URLs()...))
if err != nil {
return nil, err
}
And I tested it with a really simple query:
{
people{
name
}
}
And this was what i got:
What's happening on server-side?:
people(name:<name>)
). For me, with no GraphQL and Golang experience it was really hard to figure this one out.This cache will be available during this request, so here we are NOT taking advantage of dataloader and its cache, because the graphql operation and the http request have finished.
So, what's happening on server-side without omitting the Prime method, that is, as it is now with the call to PrimePeople within resolver/person.go:NewPeople: ?
func NewPeople(ctx context.Context, args NewPeopleArgs) (*[]*PersonResolver, error) {
err := loader.PrimePeople(ctx, args.Page)
if err != nil {
return nil, err
}
// [...]
This.
Exactly what @tonyghita has explained here:
It's not strictly necessary, but I think it's a good practice to provide the data on the cache if you have it, just in case it's asked for again somewhere else in the query. The goal is to do expensive work (i.e. a network request) once in any given request.
So thanks to this PrimePeople which is an abstraction to dataloader.Prime(), we have saved 10 requests to the SWAPI.
That's how PrimePeople works and that's how GraphQL-Dataloader communication is structured in this project. Aside from caching, the Dataloader can't do much else for us in this project because we're retrieving data from a Rest API and even though we're batching things and we have several ids/urls/etc in a single batch... we still have to make a request per endpoint. If you were connecting to a database you would be able 'select' for multiple ids in a single query.
Also, I strongly recommend reading this post: https://medium.com/@gajus/using-dataloader-to-batch-requests-c345f4b23433. It's nodeJS but you get the idea of n+1 problem if you are a newbie to GraphQL.
Hope this help anyone, sorry for my non-native english!
Hey @tonyghita, I'm trying to digest GoLang (with GraphQL), but the one part of this example I'm having a hard time with, is the flow of the loader.
I have a small pet-project running (innefficiently) using neelance's go-graphql library, but have yet to integrate dataloader (albeit, I know it's very necessary). Can you point me in the direction of a good tutorial, to help me get my head wrapped around how the loader works; specifically with the architecture you have in place?
My questions mostly surround how the interactions between the resolver layer and the loaders work. I see that the loaders are stored in the context, as a request-wide cache, but I guess I'm not seeing where the actual data-layer access is happening? Does that happen in each of the batch functions? I'm new to both the GraphQL environment and GoLang, so I do apologize for something that is a fairly basic question.