Closed NickLiffen closed 2 years ago
Thanks Nick for opening the issue and sharing your solution for pagination. You are definitely not the first asking for it. I haven't looked into it myself yet, but agree that we should provide a solution.
My plan is two create two separate plugins, one for REST and one for GraphQL. And then a plugin that combines the two somehow.
The 1st one already exists within @octokit/rest
: https://github.com/octokit/rest.js/tree/master/plugins/pagination. And I'm currently extracting it into its own plugin: https://github.com/octokit/plugin-paginate-rest.js
I'd be happy to get a repository going for https://github.com/octokit/plugin-paginate-graphql.js and we collaborate there? What would a simple usage look like?
For comparison, here is the usage for the REST paginate plugin
const MyOctokit = Octokit.plugin(paginateRest);
const octokit = new MyOctokit({ auth: "secret123" });
// See https://developer.github.com/v3/issues/#list-issues-for-a-repository
const issues = await octokit.paginate("GET /repos/:owner/:repo/issues", {
owner: "octocat",
repo: "hello-world",
since: "2010-10-01",
per_page: 100
});
Let me add that I would prefer not to add pagination to @octokit/graphql
itself, to keep it simple and its bundle size small. And it works for most use cases. If people want to use pagination with GraphQL with the least overhead possible I would point them to https://github.com/octokit/core.js and then the plugin for graphql pagination.
Hey π Let's create a repository like you said: https://github.com/octokit/plugin-paginate-graphql.js. I am more than happy to extract it out of the @octokit/graphql
module. I would see it trying to keep the same pattern and interaction as REST, so if someone was to use the REST module, the interaction could be similar for GraphQL. I would see the pagination only applying to some requests within GraphQL, as only a subset of calls would need "pagination". Maybe let's create the repository and contribute there? Thoughts?
I can get the repository going in no time, I've made a script which automates most of the setup for me https://github.com/octokit/create-octokit-project.js/
One thing we could maybe discuss here first is how you would use the plugin. I'd assume it would be something like this:
const MyOctokit = Octokit.plugin(paginateGraphql);
const octokit = new MyOctokit({ auth: "secret123" });
const query = `query repositoryIssues($owner:String!,$repo:String!,$since:DateTime!) {
repository(owner:$owner,name:$repo) {
issues(first:100, filterBy:{since:$since}) {
nodes {
number
title
}
}
}
}`;
const {
repository: {
issues: { nodes: issues }
}
} = await octokit.paginate(query, {
owner: "octocat",
repo: "hello-world",
since: "2019-10-01T00:00:00.000Z"
});
Would that work with your code?
If we created a plugin that supports both REST & GraphQL pagination, we could quite easily check for a graphql request by checking if the first argument is a string that does not start with a /
or one of the http methods
If we created a plugin that supports both REST & GraphQL pagination, we could quite easily check for a graphql request by checking if the first argument is a string that does not start with a > / or one of the http methods
^^^ I like that. Due to the fact that GitHub's GraphQL offering isn't mature enough yet to be completely independent of REST for an application, I think that would be nice.
Behind the scenes when they call octokit.paginate
we would look for the cursor within the query response and i think we would make this work π That example looks good to me. It would also give a somewhat similar interface to the REST library.
@gr2m let me know how I can help on this π
Will do! There are a few things I need to take care of first. I'll ping you once I've setup the repository
Here is the plugin scaffold: https://github.com/octokit/plugin-paginate-graphql.js/pull/1
I've invited you to the repository, you can directly edit the code: https://github.com/octokit/plugin-paginate-graphql.js/invitations
Let's continue the discussion there :)
@NickLiffen are you still interested in working on this?
I looked into this a little more. I'm curious how your GraphQL implementation looks like. From my research so far, I'd like to do the following
pageInfo { endCursor hasNextPage }
to the parts they want to paginate. No magical mutating of the query to make it paginate. Log a warning if the query has no pageInfo
nodesI could imagine that 3
might be complicated as it would require complicated string manipulation.
@gr2m I was wondering how you plan on doing nested pagination. Like for example if my query is
{
repository(owner: "octokit", name: "graphql.js") {
primaryLanguage {
name
}
pullRequests(first: 100) {
pageInfo {
endCursor
}
nodes {
labels(first: 100) {
pageInfo {
endCursor
}
nodes {
name
}
}
}
}
}
}
The labels will need to be paginated for every pull request that also requires pagination.
I'm thinking that this will need to be some kind of BFS traversal to paginate every list. You first paginate all of the parent's list and then for each parent you paginate the child's list. In my example query, you would paginate pullRequests
and then iterate through the list of pullRequests
to paginate the labels
for each pull request.
Also, ideally for all the lists that you are paginating for, your queries should only contain the objects that are under that paginated object list. So in my example above: when you paginate through pullRequests, the query to use to paginate through this should not include primaryLanguage
object because that information was already received on the first query of the pagination.
I agree with everything you said. It's complicated.
A query with a lot of nested nodes can very quickly run up to the theoretical node limit of currently 500,000, see https://developer.github.com/v4/guides/resource-limitations/#node-limit
Yet another thing to consider is the way rate limiting works for GraphQL. You can retrieve the cost of a query by requesting rateLimit.cost
, e.g.
query {
viewer {
login
}
rateLimit {
limit
cost
remaining
resetAt
}
}
And another limit is a 10s timeout. If your query is too complex, it might timeout, it's trial & error.
I don't think that it will be possible to implement the ultimate solution that will magically paginate all results for a complex query. Instead, I think GitHub needs to better educate folks how to be strategic about their request. I just saw a great talk by @ReaLoretta about these topics at GitHub's Universe: https://www.youtube.com/watch?v=i5pIszu9MeM Definitely check that out once the recording becomes available! Rea uploader the slides, check out slide 145ff.
In your particular case, you maybe don't want to split your request up in multiple queries. Request only 10 labels as that should cover most cases, and check if there are more than that which you can check in a secondary request.
{
repository(owner: "octokit", name: "graphql.js") {
primaryLanguage {
name
}
pullRequests(first: 100) {
id
pageInfo {
endCursor
hasNextPage
}
nodes {
labels(first: 10) {
pageInfo {
endCursor
hasNextPage
}
nodes {
name
}
}
}
}
}
}
For the cases where a pull request has more than 10 labels, you can use the nodes
root node into which you can pass an array of IDs into, and use that query to request the remaining labels for the selected pull requests (see slide 192).
I'm still thinking about the best balance between educating GraphQL users about best practices and providing the best tools that are not too magical but still helpful. I'm open to suggestions :)
Hey, are there any updates relating to this issue?
No, the answer will probably remain "It's complicated", and better docs, not code
Yeah, it does look rather complicated indeed, especially the sub-pagination issue, that boggles my mind to even think about.
I'll have to use some kind of helper based on the one above for now, then. Thanks!
...disappointed. I'd rather see a simple solution for majority of cases (with limitations). People don't have to use it if it doesn't apply...
I agree @ScottChapman, I just don't know what the scope would be, and what the APIs would look like.
Here is an example for a GraphQL Query that I needed to be paginated yesterday:
const QUERY = `query($cursor:String) {
repository(name:"probot", owner:"probot") {
stargazers(first: 100, after:$cursor) {
pageInfo {
endCursor
hasNextPage
}
nodes {
login
company
}
}
}
}`;
And here is my method to gather all results using cursor pagination
async function fetchStarGazers(octokit, { results, cursor } = { results: [] }) {
const { repository: { stargazers } } = await octokit.graphql(QUERY, { cursor });
results.push(...stargazers.nodes);
if (stargazers.pageInfo.hasNextPage) {
await fetchStarGazers(octokit, { results, cursor: stargazers.pageInfo.endCursor });
}
return results;
}
I built code like this all the time, I'd very much like to generalize it. Some patterns that I observed:
after:$cursor
cursor
query variablepageInfo
node with endCursor
and hasNextPage
child nodesWhat's complicated is that the object path to the paginating node is always different. In the example above it's repository -> stargazers
.
If we do a graphql pagination API, I'd like to make it very explicit, and not too magical, in order to make it easy to understand and debug.
For example something like this
graphqlPaginate(query, variables, { cursorPath: 'repository.stargazers' })
Alternatively it could require a certain name for the cursor variable, such as cursor_repository_stargazers
. The graphqlPaginate
method could look for option variables starting with cursor_
and derive the path to the paginating node from the name, so we could have the same method signature as graphql(query, variables)
.
graphqlPaginate(query, variables)
I did some research in the past about how other GraphQL clients handle this case, in the hopes that some convention would emerge, but no luck. Maybe it's time to come up with one our selves.
Let me know what you think π€
This sounds awesome. I am all for generalizing graphql pagination. I agree with convention for cursor variables makes absolute sense for sub cursors as well.
For me as a newcomer, I'm having a very hard time wrapping my brain around pagination, especially when multiple children on one level (many columns in one project) may have their own nextPages and required cursors (i think?). Documentation also is lacking of examples of how to do this so that I'm now going by resources like:
Having something generalized would be a life safer for me :)
yea, that's rough. I generally try to keep my queries that require pagination to one level if at all possible (sometimes I can accomplish that by using a different query). But I went with the solution above; I have a helper function which takes the query, data variables, and the cursor path. Works great, so far haven't needed anything beyond that. Which makes me think it would be a valuable addition to this project
Update: @laughedelic is currently working on the pagination plugin for GraphQL at https://github.com/octokit/plugin-paginate-graphql.js/pull/1
Just to clarify, I haven't actually done anything yet π Just expressed my interest. Once I have some time to work on it, I will let you know about my progress.
I hope this doesn't give false hopes and doesn't discourage anyone else from working on it π
Hey folks, here is the mentioned video about pagination from GitHub Universe where @ReaLoretta touches on concepts like "nested" pagination and other advanced concepts: https://www.youtube.com/watch?v=i5pIszu9MeM
We have also open-sourced some of the code here https://github.com/toast-ninja/github-analytics-starter.
In production (for Toast), we have more complicated heuristics, but the ultimate strategy is the same. We call it "Double Pass".
I recommend watching that video starting from here youtube.com/watch?v=i5pIszu9MeM&t=719s that covers it in detail. Mentioned repo implements the most complex version of that strategy described in the video, which is Double Pass with batching. Hope it helps.
@kvz @sustained @ScottChapman
it was a great talk, I've been there π that's how I learned about query.nodes
, super neat!
But it's beyond the scope of what I want to build at octokit/plugin-paginate-graphql.js#1, I want to cover the most simple cases with auto pagination similar to what we have for the REST API (octokit/plugin-paginate-rest
) using a few conventions.
I wish GitHub had a top level node similar to nodes
but for lists of any kind, to which I could just pass a cursor and a limit, that would make it much easier to paginate event nested queries. But I want to start with something simple first, and maybe get to that point eventually, and then I can reach out to the team in charge of the GraphQL API to see if that'd be possible
Thanks for sharing! :heart_decoration: I'm planning to use this in a project with some budgetary constraints (but no deadlines), so i'm crossing my fingers that the internet will eventually come up with an abstraction, or/like GitHub adding top level nodes as suggested :innocent:
Passively wondering if there are any updates on this (β€οΈs for everyone), I'm surprised that it requires days of learning for what should be a one-liner, reminds me of everything else react.
Okay, @gr2m I should now have some time to start this (sorry for taking sooo long). I think to set expectations this is not going to work for every use case, and to start with we will need to set out the scope of the library. E.G you could have nested down nextTokens for the following request. @gr2m is there a place maybe where we (any anyone else who is interested) could map out what we would like v1 to be, because I really don't think it is going to cover EVERY use case. Also, I don't think this issue is the right place for that discussion, but maybe a link could be posted within this chat.
Okay, @gr2m I should now have some time to start this (sorry for taking sooo long)
yay! And no worries
is there a place maybe where we (any anyone else who is interested) could map out what we would like v1 to be, because I really don't think it is going to cover EVERY use case. Also, I don't think this issue is the right place for that discussion, but maybe a link could be posted within this chat.
I think this issue here as well as the pull request at https://github.com/octokit/plugin-paginate-graphql.js/pull/1 had plenty of discussion. I'd say the scope is set by now. Let's not supported nested paginations, and let's implement the suggestion with cursor variable names mapping to the query paths, as in the last two paragraphs here:
https://github.com/octokit/graphql.js/issues/61#issuecomment-643392492
So if the query is
query($cursor_repository_stargazers:String) {
repository(name:"probot", owner:"probot") {
stargazers(first: 100, after:$cursor_repository_stargazers) {
pageInfo {
endCursor
hasNextPage
}
nodes {
login
company
}
}
}
}
Then the method should be
octokit.graphqlPaginate(query)
And the cursor for cursor_repository_stargazers
should be read out and set automatically for subsequent requests. If the pageInfo
was not part of the query then we should throw an error instead of trying to inject it.
I would suggest we don't do any alterations to the query, even though we would load the same data for nodes outside of the node that are being paginated.
I hope that GitHub will introduce a root query node specific to pagination which would make this all much easier, but I didn't have the time to lobby much for that. I'd say once we have the simple and naive implementation I described above, I can make the point that it's wasteful and implementing nested pagination would be close to impossible without having a new query root element that is made explicitly for pagination
Okay nice! I like it @gr2m I will get something underway π― Im between jobs but have some time, it won't be in the next few days but expect a PR soon to review π
Hi there :wave: i was wondering if there were any updates on generalizing pagination for github's graphql :)
Hello, are there any updates on this? I can script something for my use case but a plugin similar to what is available for for the rest api would be awesome.
I started the PR but then had a change of job so haven't got much time π’ I haven't got around to looking at it again, I did start something though and drafted a PR but didn't make much progress π’ sorry
@gr2m : I currently also need a GraphQL Pagination, and as this does not seem to have made a lot of progress recently, I'd like to throw my hat into the ring here as well with two very basic POCs I did this last weekend.
I basically thought about two solutions (I named them simple
and complex
for now ;) ), let me explain them first before I compare them.
Note I omitted the
pageInfo
Object in all examples below to not make this issue any longer. I also did not implement an iterator yet in the POC, but that would be trivial in both solutions.
Calling / creating a query for pagination in the simple solution would look like this:
octokit.paginateGraphql((cursor) => `
repository(owner: "octokit", name: "rest.js") {
issues(first: 10, after: ${cursor.create()}) {
nodes {
title
}
}
}
`);
See this test and the current implementation.
The core idea being to expect a function that gets passed a cursor(-factory) so the user then can insert the cursor where necessary. In my current solution, I even create the query-definition for the user so they don't have to take care of it - however, this probably won't work if the user also wants to bring their own variables, so in that case it could looke like this:
With Query Definition
octokit.paginateGraphql((cursor) => {
const issueCursor = cursor.create();
return `
query paginate(${issueCursor}: String, organisation: String!) {
repository(owner: $organisation, name: "rest.js") {
issues(first: 10, after: ${issueCursor}) {
nodes {
title
}
}
}
}
`
}, {
organisation: 'octotkit'
});
This would even enable us to support nested pagination, as we could let the user pass a object-path out of which we could calculate the depth of a cursor as well as know where to find the pageInfo
in the returned query:
Nested Pagination
octokit.paginateGraphql((cursor) => {
const issueCursor = cursor.create('repository.issues');
const commentsCursor = cursor.create('repository.isusues.comments');
return `
query paginate($organisation: String, ${issueCursor}: String, ${commentsCursor}: String) {
repository(owner: "octokit", name: "rest.js") {
issues(first: 10, after: ${issueCursor}) {
nodes {
comments(first:10, after:${commentsCursor}) {
nodes {
body
}
}
title
}
}
}
}
`
}, {
organisation: 'octotkit'
});
The Complex solution actually has a simpler interface as it uses the graphql library and its parse-method to turn the query into inspectable AST.
This would allow the user to pass a 'normal' query:
octokit.paginateGraphQl(`
query paginate($cursor1: String) {
repository(owner: "octokit", name: "rest.js") {
issues(first: 10, after: $cursor1) {
nodes {
title
}
}
}
}
`);
See this test and the current implementation
The way this works is that I check the parsed AST for parameters with the name after
(later we could also add before
) and where a variable is passed into to deduct the cursor-variable-name as well as the position of the paginated resources in the query (e.g. repository.issues
). With both infos, I can then easily paginate.
Thinking further, this solution would also easily allow for
totalCount
of each paginated resource.Both solution do not require any string-manipulation so can be very stable. Also, in both solutions, the query itself is still pretty easy read- and understandable.
GraphQl
- so not sure if this would run in the browser (I did not find another query parser yet)If it wasn't for the Browser-Compatability, I'd clearly go for the complex solution, so maybe this could even be two plugins - one for al environments and one just for Backends / Scripts.
What do you think about either?
Thanks for pushing this forward @davelosert!
I like the pros of the simple approach, and I don't think it has a more complex interface: creating cursors explicitly makes it less "magical". But the "complex" approach also looks good π
I wanted to check if it would work on Deno since you mentioned your doubts about the graphql
dependency. It actually mentions in the readme that it should work in the browser.
Here's my Deno test: https://gist.github.com/laughedelic/55a288f3780b11d4da3526c23cc5142e (uses GITHUB_TOKEN
env var for auth). It seems to work fine, returns more than 10 issues titles π
I wanted to try the simple approach as well, but got confused: it doesn't seem to recurse on the results π€
@laughedelic : First of all thank you for the feedback. ππ»
I agree that the simple approach is less magical - and that that might even be a good thing.
I like that the complex solution would work in the browser. That changes things, even though I'd still have to check what it does to the bundle size π
. But since i only use the parse
(and later maybe the stringify
) method, it might not even be that bad.
And oh yeah, sorry, I didn't actually finish the simple approach's implementation. Once I was sure it would work, I was too curious about the other solution so switched over there. But I will try to finish it up later today, it shouldn't take too long.
I'd also go ahead and try to draft an actual PR for the octokit/plugin-paginate-graphql.js repository later this week.
Just wanted to pop in and say thanks for moving this forward π
I honestly prefer the simple solution, less magic + no dependency.
@jetersen : Yes I agree, at least for an 'official' or maybe later even 'built in' solution (like paginate-rest in the octokit.js lib), the simple approach is the better one.
We could still think about providing an additional plugin with the GraphQL-Solution that would enable for some more advanced features (as described above) - but IMHO, those are for use-cases that generally apply more to Scripts- or Backends (e.g. where one wants to pull out all Data of a specific resource in an efficient manner), so that might even make sense to split it up.
As a user, to me there is value in some magic here, the simple approach seems fairly unwieldy to operate and not a big leap in dx forward compared to how we currently work around not having a pagination abstraction in this dev kit.
Thank you so much for taking the time to drive this forward β€οΈ I'm excited to see one of these approaches land as a plugin, and once the plugin is battle-tested and we have more feedback, we could look at including it in Octokit π
I'm torn on whether I prefer the simple or the complex solution. I do think that being able to solve the nested pagination problem would be really really valuable, so I err towards the complex solution, even though it forces us to add a dependency.
But I'm no expert. I'd love to hear the thoughts of our wonderful @octokit/js-community and @gr2m!
Just add, I am not against having the complex solution cause nested pagination is a complex problem and if we can remove the struggle. Thats great!
Isn't nested pagination also solved in the simple solution? You just need to define cursors explicitly.
simple approach seems fairly unwieldy to operate and not a big leap in dx forward compared to how we currently work around not having a pagination abstraction in this dev kit
@kvz The proposed solution requires you to simply say explicitly that you want to paginate on a given cursor in your query, nothing else. Currently, we have to implement the actual pagination logic, running queries multiple times, passing cursors between them, combining results, etc. So I think it actually is a big leap in DX.
Isn't nested pagination also solved in the simple solution? You just need to define cursors explicitly.
You're right actually - my bad. I misread π±π
Given that, I would favour the simple solution - it solves nested pagination (which is the big prize) and I do the think the developer experience is a big leap forward.
I have to apologize: I completely mixed up 'nested' and 'parallel' pagination in my head while writing down the solution and just realized, while I tried to make my actual nested example above work, that it ain't that easy.
I missed that you would not only need one nested cursor, but create one for every nested return value, which in turn requires us to alter the query. This is not really doable with the simple approach, at least not in a clean way.
I'd suggest to implement the simple solution for non-nested, flat pagination for now for three reasons:
In the long term, I can then try to implement pagination with the complex solution like described below.
Note This is only a 'draft out of my head' and requires further testing and thinking
In general, to get nested pagination working, the best approach is probably what @kevo1ution suggested: a BFS. I think this might be possible with the GraphQL
-Library-Approach as I can easily alter queries or extract subqueries with it.
It would probably look something like this:
nodes
-query to get the next level of objects, e.g. for issues.comments
:query {
nodes(ids: ["id1", "id2", "id3", ...] {
... on IssueComment {
body
}
}
}
Warning I am unsure how many Ids I can pass here at maximum - my guess is 100.
To construct this secondary query out of a main query, we would also require the __typename
to be present.
As @gr2m already stated, this can easily generate queries and requests exceeding the rate limits, so it probably makes sense to only implement this kind of pagination if we also have some sort of cost-calculation upfront.
Overall, we have to ask ourselves if supporting nested pagination is worth the cost of implementing / maintaining it, given what @gr2m also stated: It is not really recommended and it might be better to educate users on thinking about their queries (e.g. by throwing an Error with a link to educational resources once a users tries nested pagination).
@davelosert if you want a good query to test with :) Look at the one we have over at release drafter.
https://github.com/release-drafter/release-drafter/blob/06a49bf28488e030d35ca2ac6dbf7f408a481779/lib/commits.js#L32 it could be quickly modified to have nested pagination as well :)
Like going over an org's repos commit history :)
I know of one good nested one over here that generates permission report for a org. https://github.com/jenkins-infra/infra-reports/blob/cc1ae0eac18122840a90b98557d5029864871164/permissions-report/permissions-report.rb#L107
@davelosert Thanks again for taking the time to look into this!
I'm supportive of a plugin that implements the simple solution - I do think that any automatic pagination (even un-nested) is a significant DX improvement. It certainly has been for me when working the REST API!
We can look at making the plugin "default" once it has seen some action.
I wonder if there would be any way to warn people (or error!) if they try to do nested pagination with this approach - just to avoid any surprises. I think that'd be good from an ergonomic perspective.
I would love to test out non-nested pagination π The code cleanup alone π
@timrogers : Sure, let's see if I can get this into a publishing state during the weekend. Shall I just go ahead and publish it under my own account for now or do you still want me to provide it as a PR to the / within the octokit/plugin-paginate-graphql.js repository?
@jetersen : Thank you so much for the test queries - those complex real use-case will come in handy when testing! π
Happy for you to make a PR to the Octokit repository, if that sounds good to you β¨ I think this functionality is βcoreβ enough that it makes sense for it to be part of Octokit.
We often use https://github.com/octokit/create-octokit-project.js for setting up new projects as itβs a decent scaffold.
Thank you so much again for your work on this β€οΈ
Get Outlook for iOShttps://aka.ms/o0ukef
From: David Losert @.> Sent: Friday, August 26, 2022 11:59:43 AM To: octokit/graphql.js @.> Cc: Tim Rogers @.>; Mention @.> Subject: Re: [octokit/graphql.js] Pagination/Next Cursor Automation (#61)
@timrogershttps://github.com/timrogers : Sure, let's see if I can get this into a publishing state during the weekend. Shall I just go ahead and publish it under my own account for now or do you still want me to provide it as a PR to the / within the octokit/plugin-paginate-graphql.jshttps://github.com/octokit/plugin-paginate-graphql.js repository?
@jetersenhttps://github.com/jetersen : Thank you so much for the test queries - those complex real use-case will come in handy when testing! π
β Reply to this email directly, view it on GitHubhttps://github.com/octokit/graphql.js/issues/61#issuecomment-1228302813, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4LJRWDUZVL5KOEEZQNQLV3CIQ7ANCNFSM4JAPKEFA. You are receiving this because you were mentioned.Message ID: @.***>
We have https://github.com/octokit/plugin-paginate-graphql.js now π₯³
Hey,
Understand that pagination in GraphQL isn't the same as pagination within REST. One thing that we had to do when using this library is to build our own helper function around how to handle retires/pagination when returning a subset of data which is large.
It would be great if this library could handle some of that an abstract that behind the scenes so we wouldn't have to build our own helper functions which lie on top of this library.
The function we built looked something like this:
We would be happy to contribute back to this package, but first, want to understand your thoughts/motivate towards this feature/use case?