shurcooL / githubv4

Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://docs.github.com/en/graphql).
MIT License
1.12k stars 90 forks source link

Union support. #10

Closed dmitshur closed 7 years ago

dmitshur commented 7 years ago

It looks like supporting unions will be an interesting challenge.

From https://developer.github.com/v4/reference/union/:

A union is a type of object representing many objects.

For example, imagine you want to fetch a timeline of an issue. The type of timeline field of Issue object is IssueTimelineConnection, which is a connection type for IssueTimelineItem, a union:

https://developer.github.com/v4/reference/union/issuetimelineitem/

Suppose you want to find all closed/reopened events within an issue. For each event, you want to know who was the actor, and when it happened. You might write a query like this:

query {
  repository(owner: "shurcooL-test", name: "test-repo") {
    issue(number: 3) {
      timeline(first:10) {
        nodes {
          typename: __typename
          ... on ClosedEvent {
            createdAt
            actor {login}
          }
          ... on ReopenedEvent {
            createdAt
            actor {login}
          }
        }
      }
    }
  }
}

Which might give you a response like this:

{
  "data": {
    "repository": {
      "issue": {
        "timeline": {
          "nodes": [
            {
              "typename": "ClosedEvent",
              "createdAt": "2017-06-29T04:12:01Z",
              "actor": {
                "login": "shurcooL-test"
              }
            },
            {
              "typename": "ReopenedEvent",
              "createdAt": "2017-06-29T04:12:06Z",
              "actor": {
                "login": "shurcooL-test"
              }
            }
          ]
        }
      }
    }
  }
}

A Go type that can result in that query is something like this:

type ClosedEvent struct {
    Actor     struct{ Login githubql.String }
    CreatedAt githubql.DateTime
}
type ReopenedEvent struct {
    Actor     struct{ Login githubql.String }
    CreatedAt githubql.DateTime
}
type IssueTimelineItem struct {
    Typename      string `graphql:"typename :__typename"`
    ClosedEvent   `graphql:"... on ClosedEvent"`
    ReopenedEvent `graphql:"... on ReopenedEvent"`
}
var q struct {
    Repository struct {
        Issue struct {
            Timeline struct {
                Nodes []IssueTimelineItem
            } `graphql:"timeline(first: 10)"`
        } `graphql:"issue(number: $issueNumber)"`
    } `graphql:"repository(owner: $repositoryOwner, name: $repositoryName)"`
}

Unfortunately, because IssueTimelineItem embeds both ClosedEvent and ReopenedEvent, according to the current encoding/json unmarshaling rules:

If there are multiple fields at the same level, and that level is the least nested (and would therefore be the nesting level selected by the usual Go rules), the following extra rules apply:

1) Of those fields, if any are JSON-tagged, only tagged fields are considered, even if there are multiple untagged fields that would otherwise conflict.

2) If there is exactly one field (tagged or not according to the first rule), that is selected.

3) Otherwise there are multiple fields, and all are ignored; no error occurs.

The multiple fields are ignored. So, the following JSON value:

{
  "typename": "ClosedEvent",
  "createdAt": "2017-06-29T04:12:01Z",
  "actor": {
    "login": "shurcooL-test"
  }
}

When unmarshaled into IssueTimelineItem, becomes:

IssueTimelineItem{
    Typename: "ClosedEvent",
    ClosedEvent: ClosedEvent{
        Actor: struct{ Login githubql.String }{
            Login: "",
        },
        CreatedAt: githubql.DateTime{},
    },
    ReopenedEvent: ReopenedEvent{
        Actor: struct{ Login githubql.String }{
            Login: "",
        },
        CreatedAt: githubql.DateTime{},
    },
}

Only the value of Typename has correct value. Accessing item.ClosedEvent.CreatedAt and item.ClosedEvent.Actor.Login gives zero values.

This is not a problem if the multiple potential event types don't have common fields.

I came up with the following solution on the user-side, to implement a custom UnmarshalJSON method on the IssueTimelineItem type:

// UnmarshalJSON implements the json.Unmarshaler interface.
func (i *IssueTimelineItem) UnmarshalJSON(data []byte) error {
    // Ignore null, like in the main JSON package.
    if string(data) == "null" {
        return nil
    }
    var top struct {
        Typename string
    }
    err := json.Unmarshal(data, &top)
    if err != nil {
        return err
    }
    *i = IssueTimelineItem{Typename: top.Typename}
    var v interface{}
    switch i.Typename {
    case "ClosedEvent":
        v = &i.ClosedEvent
    case "ReopenedEvent":
        v = &i.ReopenedEvent
    default:
        return nil
    }
    err = json.Unmarshal(data, v)
    return err
}

But this is very verbose, inconvenient, error prone. It also means the custom IssueTimelineItem type must be declared at package level rather than inside a function. I want to find a better solution.

One of the big guns here is making a copy of encoding/json internal to githubql and changing its behavior, but that's something I'd really like to avoid.

dmitshur commented 7 years ago

An update. I mentioned 3 days ago in https://github.com/google/go-github/issues/646#issuecomment-312952417 that:

Next, I plan to work on resolving shurcooL/githubql#10 (improving support for unions), because that's the biggest usability issue right now. I have a plan for how to tackle it.

I've been working on that plan, and I can say today that it's looking quite good. It was a lot of work, but it was worth it. It will resolve this issue in a nice way (no more need to implement UnmarshalJSON method!), but also lead to benefits in other areas that I knew could be improved.

I'll be cleaning up the code, and pushing it out over the next few days.

The gist of the solution is that I implemented custom JSON unmarshaling logic specifically for the GraphQL query input structure. It's aware of the graphql struct tags, and doesn't get thrown off by json tags.

Instead of forking encoding/json or creating an entire JSON parser from scratch, I built it on top of the JSON tokenizer that was added in Go 1.5 (see json.Decoder.Token).

User query ends up looking like this (and nothing more), and it works as one would expect:

var q struct {
    Repository struct {
        Issue struct {
            Timeline struct {
                Nodes []struct {
                    Typename    string `graphql:"__typename"`
                    ClosedEvent struct {
                        Actor     struct{ Login githubql.String }
                        CreatedAt githubql.DateTime
                    } `graphql:"... on ClosedEvent"`
                    ReopenedEvent struct {
                        Actor     struct{ Login githubql.String }
                        CreatedAt githubql.DateTime
                    } `graphql:"... on ReopenedEvent"`
                }
            } `graphql:"timeline(first: 10)"`
        } `graphql:"issue(number: $issueNumber)"`
    } `graphql:"repository(owner: $repositoryOwner, name: $repositoryName)"`
}
dmitshur commented 7 years ago

I've created #15 which will resolve this issue.

dmitshur commented 7 years ago

I've documented the support for inline fragments in the README in commit 48704d75118eb275751b1b9d84d4658c2bece9b3, see:

https://github.com/shurcooL/githubql#inline-fragments