markmichon / gatsby-source-github-gql

Gatsby source plugin to pull in data from the GitHub graphql API
1 stars 0 forks source link

Getting around the query limit of 100 records #4

Open nickserv opened 6 years ago

nickserv commented 6 years ago

Hi, thanks for the handy plugin. I'm transitioning my portfolio site to Gatsby, and in order for some of my custom project rendering logic to work I have to get a list of all my repositories. However since I have over 100, I have to either miss a few repositories since I can't make another query or I hit the record limit error:

Error: Requesting 1000 records on the repositories connection exceeds the first limit of 100 records.: {"response":{"data":null,"errors":[{"message":"Requesting 1000 records on the repositories connection exceeds the first limit of 100 records."}],"status":200},"request":{"query":"\n {\n viewer {\n repositories(first: 1000) {\n edges {\n node {\n description\n homepageUrl\n id\n name\n url\n }\n } \n }\n }\n }\n "}}

Config:

module.exports = {
  plugins: [
    {
      resolve: 'gatsby-source-github-gql',
      options: {
        auth: process.env.GITHUB_TOKEN,
        query: `
          {
            viewer {
              repositories(first: 100) {
                edges {
                  node {
                    id
                  }
                }
              }
            }
          }
        `
      }
    }
  ]
}

I assume this isn't possible yet because of the single GraphQL request limitation, but I'd appreciate any feedback if you think I could do this with pagination or another workaround.

markmichon commented 6 years ago

@nickmccurdy Yeah it looks like a limitation of Github's GraphQL API 😢 . You might be able to do something with the after argument to load 100 at a time and make subsequent queries for the rest. I wasn't able to get it working with the api explorer though. I'm curious if the old v3 rest api supports it. In that case a new source plugin would be required though.

nickserv commented 6 years ago

You're right, the GitHub GraphQL docs do say there's a limit of 100 records. Besides before and after I also found viewer.repositories.edges.cursor, which could be useful in paginating further requests. Do you think it would be possible for this plugin to follow cursors automatically, or alternatively allow multiple queries dynamically (merging all the nodes into one type)?

For now I'm using a custom plugin that queries the v3 REST API once per repository, based on a data file of repository names to include. So far I'm working on this GraphQL query to try to get more convenient repository querying by name, I'd appreciate if you have any ideas (main issue is I don't know if I can set the query's name and owner argument in a fragment or I have to repeat it like this). If I can find a way to do it with this plugin I'd be happy to attempt a PR.

fragment repository on Repository {
  description
  homepageUrl
  id
  name
  url
}

{
  dotfiles: repository(owner: "nickmccurdy", name: "dotfiles") {
    ...repository
  }
  purespec: repository(owner: "nickmccurdy", name: "purespec") {
    ...repository
  }
}
markmichon commented 6 years ago

From what I've come across with fleshing the plugin out further, the dynamic nature of github's graphql makes it difficult to find a nice one size fit's all solution for this plugin.

That said, I think your solution could be doable via aliases. For example:

query {
  dotfiles: user(login:"nickmccurdy") {
    repository(name: "dotfiles") {
      id
      name
      url
    }
  }
  purespec: user(login:"nickmccurdy") {
    repository(name: "purespec") {
      id
      name
      url
    }
  }
}

In fact, the named nature actually returns a much nicer response, predictable, shape for parsing (you can see it if you drop the above into the explorer). For the plugin setup, you'd have to take a file input in the settings, then build the query from the contents, then send it off, then parse it. No clue if the request limits kick in or not though.

edit: Just reread your post and realized I basically duplicated what you had but without fragments and made it worse. oops! So yes, tldr, doable, but probably not in this plugin's current incarnation.

nickserv commented 6 years ago

Thanks for your advice. I have enough repositories listed that I think it would be easier to have a GraphQL query per repository with a variable (like the one below). After I read the repository list and reformat its data into a list of repository names, I map over each one calling it with graphql-request.

query Repository($name: String!) {
  viewer {
    repository(name: $name) {
      description
      homepageUrl
      id
      name
      url
    }
  }
}

While I could continue implementing this in my own plugin, do you think it would be useful if I patched this plugin to take a configuration function that's given a callback for each GraphQL call and merges them into one plugin? I figured this could also help with the multiple query use case unless you have a simpler idea (one alternative I thought of is supporting Apollo Client's @connection directive on GitHub's pagination cursors), but if you think this use case is out of scope for this plugin that's fine.

markmichon commented 6 years ago

I'd be open to the added functionality, especially if it's configurable by the user. 👍 Your use case is, albeit on a larger scale, basically my initial thought process for the plugin anyway.

I haven't spent time with Apollo yet. I know some Gatsby users have hit problems integrating it, but I believe those are all problems with the live site rather than the build process. That should all be resolved with Gatsby v2 anyway.