shurcooL / graphql

Package graphql provides a GraphQL client implementation.
MIT License
709 stars 282 forks source link

Support for generating (limited) recursive queries #9

Open warpfork opened 6 years ago

warpfork commented 6 years ago

The setup

First off, just to clear the air -- GraphQL doesn't really support recursion. This is a conscious design decision by the GraphQL developers, and it's one I largely understand and agree with. Recursive queries are a great way to accidentally generate unbounded load on your servers.

However, that's not to say some finite amount of recursion isn't useful. It's possible to come up with situations where a query should use some type A which contains SomeField A[]. The recursion might be limited at some depth -- possibly 1, possibly 2; any finite number -- even 1 is significant.

So, as formulated nicely in an upstream issue in GraphQL, this sort of query for example is sometimes useful:

{
  messages {
    ...CommentsRecursive
  }
}

fragment CommentsRecursive on Message {
  comments {
    ...CommentFields
    comments {
      ...CommentFields
      comments {
        ...CommentFields
      }
    }
  }
}

fragment CommentFields on Comment {
  id
  content
}

I've been experimenting with this, and it seems like it's not a huge problem to do it on the server side with neelance/graphql-go, happily (it just terminates recursion when it runs out of data, and implementing that is Our Problem, but that's fine).

This client library, however, currently gets hung up if we use a single type that contains itself, like this:

type Foobar struct {
    Children []Foobar
}

The current hangup isn't because the library blocks recursive queries, but rather because it's a tad too enthusiastic about recursing... forever. This is the relevant part of the stack trace resulting:

runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

goroutine 7 [running]:
... some frames elided ....
theproject/vendor/github.com/shurcooL/graphql.writeQuery(0x15b3be0, 0xc42017f180, 0x15be320, 0x139e100, 0x0)
        /theproject/vendor/github.com/shurcooL/graphql/query.go:103 +0x93 fp=0xc4402005c0 sp=0xc4402004c0 pc=0x1303dd3
theproject/vendor/github.com/shurcooL/graphql.writeQuery(0x15b3be0, 0xc42017f180, 0x15be320, 0x13ba360, 0x133be00)
        /theproject/vendor/github.com/shurcooL/graphql/query.go:123 +0x11f fp=0xc4402006c0 sp=0xc4402005c0 pc=0x1303e5f
theproject/vendor/github.com/shurcooL/graphql.writeQuery(0x15b3be0, 0xc42017f180, 0x15be320, 0x1358c40, 0x0)
        /theproject/vendor/github.com/shurcooL/graphql/query.go:100 +0x3fa fp=0xc4402007c0 sp=0xc4402006c0 pc=0x130413a
theproject/vendor/github.com/shurcooL/graphql.writeQuery(0x15b3be0, 0xc42017f180, 0x15be320, 0x13ac720, 0x133be00)
        /theproject/vendor/github.com/shurcooL/graphql/query.go:123 +0x11f fp=0xc4402008c0 sp=0xc4402007c0 pc=0x1303e5f
... loop continues between 100 and 123 ...

Basically, the writeQuery function as currently written is perfectly happy to keep looking up the type of the Foobar struct, looking at the Children field and generating a little more query, then recursing to look up the type of the field, which is of course still Foobar...

(not-so-great) Workarounds

Once solution to this issue is "don't have recursive queries".

If you can get away with that, great. If trying to shoehorn that in to a situation where the types really are correctly represented as recursive, ehm.

We tried doing a couple of copypasta types -- e.g. just a T1, T2, T3, etc, where T1 has a field that's a slice of T2, etc -- but this approach does not come out very elegant: those almost-identical types become visible to more other code in the project than we'd like.

Potential Solutions

The GraphQL upstream opinion seems to be that finite and explicit recursion is fine -- just pick a depth. I'd propose we teach this library how to do that. The main question is how to communicate it to the library, since the structs and tags are where all the communication currently goes on.

My rough proposal would be adding a map to the writeQuery function where it keeps track of where it's visited, and increments the map values on each visit. This map could be keyed by a tuple of {t reflect.Type, fieldOffset int}, which marks the recursion edge itself. This would also correspond nicely with adding more info to the tag on the struct field itself for how much we want to limit recursion.

How to add another parameter to the field tags is a little more interesting. Since the entire quoted section of the tag is current graphQL snippet, perhaps the clearest thing to do would actually be to add a second tag, like so: SomeField TypeXgraphql:"... on foobar",graphql-recursion:3` I'm definitely open to other designs on that though.

There are some limits to this approach, namely that the tags approach leaves us sort of stuck declaring the recursion depths statically at compile time. But that's roughly consistent with the library's approach as a whole, and seems to be a pretty acceptable trade to me.

WDYT?

warpfork commented 6 years ago

Tried out making the patch to the writeQuery method, and this appears to do the trick:

--- a/vendor/github.com/shurcooL/graphql/query.go
+++ b/vendor/github.com/shurcooL/graphql/query.go
@@ -88,16 +88,23 @@ func writeArgumentType(w io.Writer, t reflect.Type, value bool) {
 // E.g., struct{Foo Int, BarBaz *Boolean} -> "{foo,barBaz}".
 func query(v interface{}) string {
        var buf bytes.Buffer
-       writeQuery(&buf, reflect.TypeOf(v), false)
+       writeQuery(&buf, reflect.TypeOf(v), map[edge]int{}, false)
        return buf.String()
 }

+// edge is simply a tuple to key the visitation map that we use to keep
+// writeQuery from recursing without bound on recursive types.
+type edge struct {
+       t  reflect.Type
+       fn int
+}
+
 // writeQuery writes a minified query for t to w.
 // If inline is true, the struct fields of t are inlined into parent struct.
-func writeQuery(w io.Writer, t reflect.Type, inline bool) {
+func writeQuery(w io.Writer, t reflect.Type, visited map[edge]int, inline bool) {
        switch t.Kind() {
        case reflect.Ptr, reflect.Slice:
-               writeQuery(w, t.Elem(), false)
+               writeQuery(w, t.Elem(), visited, false)
        case reflect.Struct:
                // If the type implements json.Unmarshaler, it's a scalar. Don't expand it.
                if reflect.PtrTo(t).Implements(jsonUnmarshaler) {
@@ -111,6 +118,14 @@ func writeQuery(w io.Writer, t reflect.Type, inline bool) {
                                io.WriteString(w, ",")
                        }
                        f := t.Field(i)
+
+                       // Check how many times we've traversed this before (recursion limit).
+                       edge := edge{t, i}
+                       visited[edge]++
+                       if visited[edge] >= 3 { // placeholder hardcoded limit for trying this out
+                               continue
+                       }
+
                        value, ok := f.Tag.Lookup("graphql")
                        inlineField := f.Anonymous && !ok
                        if !inlineField {
@@ -120,7 +135,8 @@ func writeQuery(w io.Writer, t reflect.Type, inline bool) {
                                        io.WriteString(w, ident.ParseMixedCaps(f.Name).ToLowerCamelCase())
                                }
                        }
-                       writeQuery(w, f.Type, inlineField)
+                       writeQuery(w, f.Type, visited, inlineField)
+                       visited[edge]--
                }
                if !inline {
                        io.WriteString(w, "}")
dmitshur commented 6 years ago

Thanks a lot for reporting this, this is very helpful!

First, the fact that you can get a stack overflow when using the library sounds like a bug. It should probably detect that situation and return an error (the only reasonable thing to do without additional information). The only reason it doesn't is because I've never run into this situation during development (and you're the first to report it, so thanks).

Second, a graphql-recursion (or maybe graphql-depth) struct field tag to specify a depth to expand sounds like a very reasonable thing to consider adding. It's orthogonal and only needed when otherwise an infinite recursion would happen.

So it seems likely we can find a good resolution to support queries like this. Thanks for getting it this far.

Before proceeding with any solutions, I want to see what the code using the existing API looks like for the sample query you provided. That way, it's a lot easier to evaluate any potential solutions and pick the best one.

dmitshur commented 6 years ago

So this GraphQL query:

{
  messages {
    ...CommentsRecursive
  }
}

fragment CommentsRecursive on Message {
  comments {
    ...CommentFields
    comments {
      ...CommentFields
      comments {
        ...CommentFields
      }
    }
  }
}

fragment CommentFields on Comment {
  id
  content
}

Can be expressed using existing graphql API like so:

var q struct {
    Messages []struct {
        Comments []struct {
            ID       graphql.ID
            Content  string
            Comments []struct {
                ID       graphql.ID
                Content  string
                Comments []struct {
                    ID      graphql.ID
                    Content string
                }
            }
        }
    }
}
err := client.Query(context.Background(), &q, nil)
if err != nil {
    // Handle error.
}

Whereas, with the proposed approach, it would be:

type comments struct {
    ID       graphql.ID
    Content  string
    Comments []comments
}
var q struct {
    Messages []struct {
        Comments []comments `graphql-depth:"3"`
    }
}
err := client.Query(context.Background(), &q, nil)
if err != nil {
    // Handle error.
}

Does that look right?

eric-am commented 6 years ago

Yup, that's pretty much the shape of it.

The example you give of using nested anonymous structs like that is very interesting and looks a lot cleaner than doing it with named package-level structure -- that approach hadn't occurred to me. I bet it works nicely in many scenarios. Doing any recursive algorithms for handling the data would still require copying back into another structure though I think, or require defining more interfaces (which is in turn impossible with this approach because IIUC one can't attach methods to anonymous structs) -- so tl;dr there's still compelling reasons to want to work with named struct types as well.

nyarly commented 4 years ago

As mentioned on githubv4: the Github GraphQL API has Git Tree objects, with and Entries list of GitObject that might itself be Tree. Roughly speaking, we're representing a repository checkout as a tree of it's directory hierarchy.

On the one hand: a use case in the wild. In the other, whatever solution needs to handle that recursion when it isn't direct, and through a fragment.