shurcooL / githubv4

Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://docs.github.com/en/graphql).
MIT License
1.11k stars 89 forks source link

Query batching and/or dynamic queries. #17

Open metalmatze opened 7 years ago

metalmatze commented 7 years ago

I want to query multiple repositories at the same time. But I don't want to write the query for a specific number of repositories, but rather create the query at runtime. I currently don't see a way to do that.

{
  justwatchcom_gopass: repository(owner: "justwatchcom", name: "gopass") {
    id
    name
  }
  prometheus_prometheus: repository(owner: "prometheus", name: "prometheus") {
    id
    name
  }
   ... a lot of other dynamic repositories added at runtime
}

As a work around I'm doing these queries one after another, but as it's one of the benefits of doing multiple resource queries at ones we should try to support this.

dmitshur commented 7 years ago

This is a valid use case that githubql should support, but currently doesn't (at least I can't think of a way). Thanks for reporting.

I'll need to think about how to best handle this.

dmitshur commented 7 years ago

One idea I have (that I want to jot down) about how to potentially tackle this is to support map types for dynamic queries such as this one.

So, your query could look something like this:

var q = map[string]interface{}{
    `repository(owner: "octocat", name: "Hello-World")`: struct {
        Description githubql.String
    }{},
    `repository(owner: "justwatchcom", name: "gopass")`: struct {
        ID   githubql.ID
        Name githubql.String
    }{},
    `repository(owner: "prometheus", name: "prometheus")`: struct {
        ID   githubql.ID
        Name githubql.String
    }{},
}

However, it needs to be carefully thought out. It introduces mixing query information within types and values, which I've seen work poorly when I attempted it in the past.

(Just wanted to write it down so I don't forget. But it's possible another solution will be better.)

dmitshur commented 7 years ago

Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a map[string]interface{}. But this would be a very different API to make queries, so I see it as a last resort, if no better solution can be found.

dmitshur commented 7 years ago

I wanted to post an update here. I also ran into this need recently at:

https://github.com/shurcooL/notifications/blob/b2920e64fbc3c388d5191433dd5493c75f302c14/githubapi/githubapi.go#L92-L96

I will continue to think about the best possible resolution to this, and post updates here if I have any. The above just means it'll be slightly easier for me to evaluate an idea, if I get a new one.

dmitshur commented 6 years ago

There's been yet another place I would've found this handy:


// fetchCommit fetches the specified commit.
func (s *service) fetchCommit(ctx context.Context, repoID int64, sha string) (*commit, error) {
    // TODO: It'd be better to be able to batch and fetch all commits at once (in fetchEvents loop),
    //       rather than making an individual query for each.
    //       See https://github.com/shurcooL/githubql/issues/17.

    // ...
    err := s.clV4.Query(ctx, &q, variables) // Fetch a single commit.
    // ...
}

I'm starting to think that a good way of thinking about this issue might be as "query batching" rather than fully dynamic queries. The idea would be you provide a single query and a array of different variables, and you get a result for that query for each element in the variables array.

This line of thinking might help arrive at a reasonable API that works for most usescases.

osela commented 6 years ago

@dmitshur Any progress on this? I'd love to help. I'm currently skipping this library entirely in these scenarios and using simple POST requests with string queries I construct myself.

It seems to me like many scenarios could be represented as a field of type map[string]SomeType. So the question is about making it easy and clear to provide aliases and parameters? Or is it more fundamental?

dmitshur commented 5 years ago

@osela There's no progress on this issue from me, because I haven't had any free time left over to think about this. If you're looking to solve this, I'd recommend prototyping a solution on your own branch and sharing your updates here.

dmitshur commented 5 years ago

I've made significant progress on this issue this weekend. It turns out it has been possible to perform query batching and/or dynamic queries all along, without any API changes to this package. Read on for details.

Consider the following GraphQL query to fetch multiple GitHub repositories:

{
  go: repository(owner: "golang", name: "go") {
    nameWithOwner
    createdAt
    description
  }
  graphql: repository(owner: "shurcooL", name: "githubv4") {
    nameWithOwner
    createdAt
    description
  }
}

If executed against GitHub GraphQL API v4 It returns a JSON response like:

GraphQL JSON Response
``` { "data": { "go": { "nameWithOwner": "golang/go", "createdAt": "2014-08-19T04:33:40Z", "description": "The Go programming language" }, "graphql": { "nameWithOwner": "shurcooL/githubv4", "createdAt": "2017-05-27T05:05:31Z", "description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)." } } } ```

It's possible to perform that exact query using `githubv4` package like so: ```Go var q struct { Go struct { NameWithOwner string CreatedAt time.Time Description string } `graphql:"go: repository(owner: \"golang\", name: \"go\")"` GitHubV4 struct { NameWithOwner string CreatedAt time.Time Description string } `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"` } err := client.Query(context.Background(), &q, nil) if err != nil { return err } enc := json.NewEncoder(os.Stdout) enc.SetIndent("", "\t") enc.Encode(q) // Output: // { // "Go": { // "NameWithOwner": "golang/go", // "CreatedAt": "2014-08-19T04:33:40Z", // "Description": "The Go programming language" // }, // "GitHubV4": { // "NameWithOwner": "shurcooL/githubv4", // "CreatedAt": "2017-05-27T05:05:31Z", // "Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)." // } // } ``` Of course, the list of repositories can only be adjusted at compile time, since it's a part of the query struct type. However, I got an idea: it's possible to use [`reflect`](https://godoc.org/reflect) package and its [`reflect.StructOf`](https://godoc.org/reflect#StructOf) function to dynamically construct a query struct. For example, the same query as above, created at runtime: ```Go q := reflect.New(reflect.StructOf([]reflect.StructField{ { Name: "Go", Type: reflect.TypeOf(struct { NameWithOwner string CreatedAt time.Time Description string }{}), Tag: `graphql:"go: repository(owner: \"golang\", name: \"go\")"`, }, { Name: "GitHubV4", Type: reflect.TypeOf(struct { NameWithOwner string CreatedAt time.Time Description string }{}), Tag: `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`, }, })).Elem() err := client.Query(context.Background(), q.Addr().Interface(), nil) if err != nil { return err } enc := json.NewEncoder(os.Stdout) enc.SetIndent("", "\t") enc.Encode(q.Interface()) // Output: // { // "Go": { // "NameWithOwner": "golang/go", // "CreatedAt": "2014-08-19T04:33:40Z", // "Description": "The Go programming language" // }, // "GitHubV4": { // "NameWithOwner": "shurcooL/githubv4", // "CreatedAt": "2017-05-27T05:05:31Z", // "Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)." // } // } ``` As you can see, it works, and produces the same results. Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime. It's important to note I used the word "possible" at the beginning. The syntax for using `reflect` is more cumbersome compared to declaring a Go type using normal Go code, and this is not necessarily the final solution. But it's a good step forward. I have some ideas for how to wrap this same functionality in a nicer API, to be explored later. I've prototyped this approach in a real codebase where I wanted to perform GraphQL query batching, and it seems to work well. See https://github.com/shurcooL/notifications/commit/926403120bbcbee30699b0ea339c947775dac05f.
osela commented 5 years ago

@dmitshur That's a nice approach, and it allows great flexibility. I don't know what is the nicer API you had in mind, but I'm wondering whether the common use case of query batching (rather than fully dynamic queries) doesn't deserve it's own API.

The major drawback I see in this approach (apart from the cumbersome use of reflect) is that it doesn't allow reuse of simpler queries structs.

Ideally, I would like to take a simple struct that I already use for querying a single repo

type q struct {
    Repository struct {
        Description string
    } `graphql:"repository(owner: $owner, name: $name)"`
}

and use it in a batch query, in a way that closely resembles the single query API. Something like

var batch []*q
variables := []map[string]interface{}{
    {
        "owner": "golang",
        "name":  "go",
    },
    {
        "owner": "shurcooL",
        "name":  "githubv4",
    },
}
err := client.BatchQuery(context.Background(), &batch, variables)

The implementation would probably involve some ugly regex work to rename the arguments and the aliases so they are unique, but it's mostly hidden from the user. I'm still thinking about the best way to handle errors though.

What do you think?

cheshire137 commented 5 years ago

Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime.

Can you have a dynamic alias for each query, though? Your example hard-codes the tag like Tag:graphql:"go: repository(owner: \"golang\", name: \"go\")"``, and I notice I get an error if I try to generate a tag:

image
paultyng commented 5 years ago

@cheshire137 I think in that case you'd need to just cast to reflect.StructTag, ie: reflect.StructTag(fmt.Sprintf(...))

dmitshur commented 5 years ago

@cheshire137 Yes, that should work if you use the suggestion @paultyng posted above. Please feel free to let me know if there's more to it.

micimize commented 3 years ago

Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a map[string]interface{}. But this would be a very different API to make queries, so I see it as a last resort, if no better solution can be found.

This still seems like a decent solution to me, and IMO it is not an API minus to expose the internal "untyped" layer.

The "map with tags as fields" approach is very pretty – looks difficult to make work, but it could replace helloworld: repository(owner: "octocat", name: "Hello-World") with "helloworld" in the end result.

rhcarvalho commented 3 years ago

I tried out the idea of using reflect to build dynamic queries from https://github.com/shurcooL/githubv4/issues/17#issuecomment-437646874 for mutations.

Documenting the experience here for posteriority & sharing. TL;DR: don't do it.


I wanted to add several issues to a project at once. With the code below the request is well formed, but if the batch is too large the request times out with some mutations applied, some not.

I also noticed an inconsistency in that issues will be associated with a project (as seen when looking at the issue page on GitHub), but then the search API misses that association. It was not a case of eventual consistency, data was still inconsistent after 12+h.

Code ```go type AddProjectCard struct { CardEdge struct { Node struct { URL githubv4.URI } } } var fields []reflect.StructField vars := make(map[string]interface{}) for i, contentID := range contentIDs { fields = append(fields, reflect.StructField{ Name: fmt.Sprintf("AddProjectCard%d", i), Type: reflect.TypeOf(AddProjectCard{}), Tag: reflect.StructTag(fmt.Sprintf(`graphql:"addProjectCard%d:addProjectCard(input:$input%[1]d)"`, i)), }) vars[fmt.Sprintf("input%d", i)] = githubv4.AddProjectCardInput{ ProjectColumnID: projectColumnID, ContentID: githubv4.NewID(contentID), } } // Work around githubv4.Client.Mutate requiring a variable named "input". fields[0].Tag = reflect.StructTag(strings.Replace(string(fields[0].Tag), "$input0", "$input", 1)) vars["input"] = vars["input0"] delete(vars, "input0") m := reflect.New(reflect.StructOf(fields)).Elem() err := pm.client.Mutate(context.Background(), m.Addr().Interface(), vars["input"], vars) ```

Looking for documentation around batching mutations, I found https://docs.github.com/en/graphql/overview/resource-limitations which talks about the cost of queries, but not about the cost of mutations. I couldn't find anywhere in the GitHub GraphQL API docs that we should always do only one mutation per request, but that seems to be the safest approach. In https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits, for the REST API, there's a guideline:

If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.

That's the closest to "take it slow" I could find. So a time.Ticker to limit throughput and a single mutation per request it is.

patrickdevivo commented 2 years ago

I've been encountering this issue as well (a need for "dynamic" queries where fields are only known at runtime). I recently started this project as a way to help address it, as an alternative to the reflect-based approach. It's still an early project but wanted to share it here in case folks find it useful.