mickhansen / graphql-sequelize

GraphQL & Relay for MySQL & Postgres via Sequelize
MIT License
1.9k stars 172 forks source link

Proposal for using windowed queries instead of OFFSET, without breaking hasPreviousPage/hasNextPage #598

Closed jedwards1211 closed 4 years ago

jedwards1211 commented 6 years ago

I'd like to make a PR to restore the windowed query strategy for the sake of performance. I've been thinking about another user's proposal (I lost the link to it) to fetch the items before/after the requested range to determine if there's a previous/next page.

Cursors could be a stringified JSON array of values for all order by attributes (including the primary key(s) if they are not explicitly requested).

So we could get a query like the following:

query {
  Users(first: 10, orderBy: [FIRST_NAME, LAST_NAME], after: "[\"Andy\",\"Edwards\",23]") {
    pageInfo {
      hasPreviousPage
      hasNextPage
    }
  }
}

We would basically want to fetch this, and correct me if there is a more efficient way to do the windowing:

Users.findAll({
  limit: 11, // include one more item to determine if hasNextPage
  where: {
    [Op.or]: {
      {firstName: {[Op.gt]: "Andy"}},
      {firstName: "Andy", lastName: {[Op.gt]: "Edwards"}},
      {firstName: "Andy", lastName: "Edwards", id: {[Op.gt]: 23}},
    }
  }
})

To determine hasPreviousPage, we would need to look for the first item in the opposite direction in a second query. At first I thought we could just see if the item at the after cursor exists, but then I realized that item could have been deleted, in which case we need to see if there's something before it (which could be UNIONed with the first query using this strategy, and omitted if hasPreviousPage is not requested):

Users.findAll({
  limit: 1, // include one more item to determine if hasNextPage
  where: {
    [Op.or]: {
      {firstName: {[Op.lt]: "Andy"}},
      {firstName: "Andy", lastName: {[Op.lt]: "Edwards"}},
      {firstName: "Andy", lastName: "Edwards", id: {[Op.lte]: 23}},
    }
  }
})

@mickhansen what do you think?

mickhansen commented 6 years ago

@jedwards1211 I'd be very open to that

jedwards1211 commented 6 years ago

@mickhansen if you care to look, I have untested work in progress here: https://github.com/jedwards1211/graphql-sequelize/blob/fix-connection-pagination/src/relay.js

Notes

jedwards1211 commented 6 years ago

Also I haven't implemented the second part of the query to look back in the opposite direction so that I can determine hasPreviousPage. I'm running into a roadblock there, because

jedwards1211 commented 6 years ago

@mickhansen I'm getting closer on my branch! I got windowing working, and got it to make a second query with LIMIT 1 to determine hasPreviousPage/hasNextPage. Here is what it looks like in action:

SELECT "id", "createdAt", "updatedAt" FROM "users" AS "user" 
WHERE "user"."id" = 1;

SELECT "createdAt", "id", "name", "completed", "otherDate", "updatedAt", "userId", "projectId" FROM "tasks" AS "task"
WHERE ("task"."userId" = 1) ORDER BY "task"."createdAt" DESC, "task"."id" ASC LIMIT 4;

SELECT "id", "createdAt", "updatedAt" FROM "users" AS "user"
WHERE "user"."id" = 1;

SELECT "createdAt", "id", "name", "completed", "otherDate", "updatedAt", "userId", "projectId" FROM "tasks" AS "task" 
WHERE ("task"."userId" = 1 AND (("task"."createdAt" < '2015-11-17 09:23:45.000 +00:00' OR ("task"."createdAt" = '2015-11-17 09:23:45.000 +00:00' AND "task"."id" > 7)))) ORDER BY "task"."createdAt" DESC, "task"."id" ASC LIMIT 4;

SELECT "createdAt", "id", "name", "completed", "otherDate", "updatedAt", "userId", "projectId" FROM "tasks" AS "task" 
WHERE ("task"."userId" = 1 AND ((("task"."createdAt" = '2015-11-17 09:23:45.000 +00:00' AND "task"."id" = 7) OR "task"."createdAt" > '2015-11-17 09:23:45.000 +00:00' OR ("task"."createdAt" = '2015-11-17 09:23:45.000 +00:00' AND "task"."id" < 7)))) ORDER BY "task"."createdAt" ASC, "task"."id" DESC LIMIT 1;

SELECT "id", "createdAt", "updatedAt" FROM "users" AS "user" 
WHERE "user"."id" = 1;

SELECT "createdAt", "id", "name", "completed", "otherDate", "updatedAt", "userId", "projectId" FROM "tasks" AS "task" 
WHERE ("task"."userId" = 1 AND (("task"."createdAt" < '2015-11-17 09:23:30.000 +00:00' OR ("task"."createdAt" = '2015-11-17 09:23:30.000 +00:00' AND "task"."id" > 4)))) ORDER BY "task"."createdAt" DESC, "task"."id" ASC LIMIT 4;

SELECT "createdAt", "id", "name", "completed", "otherDate", "updatedAt", "userId", "projectId" FROM "tasks" AS "task" 
WHERE ("task"."userId" = 1 AND ((("task"."createdAt" = '2015-11-17 09:23:30.000 +00:00' AND "task"."id" = 4) OR "task"."createdAt" > '2015-11-17 09:23:30.000 +00:00' OR ("task"."createdAt" = '2015-11-17 09:23:30.000 +00:00' AND "task"."id" < 4)))) ORDER BY "task"."createdAt" ASC, "task"."id" DESC LIMIT 1;
      ✓ should support in-query slicing and pagination with first and orderBy

I noticed in the Relay Cursor Connections Specification that the logic for hasPreviousPage says (note the "If the server can efficiently" part):

HasPreviousPage(allEdges, before, after, first, last):
  If last is set:
    Let edges be the result of calling ApplyCursorsToEdges(allEdges, before, after).
    If edges contains more than last elements return true, otherwise false.
  If after is set:
    If the server can efficiently determine that elements exist prior to after, return true.
  Return false.

(likewise for hasNextPage)

This means we don't have to do a second query looking in the opposite direction to determine if there's a previous/next page for sure. We could add an option to sequelizeConnection or to the GraphQL args to turn that extra query on or off, so that devs can avoid that extra work if they're not going to use that information. I'm also planning to automatically turn off that query if they didn't request the hasPreviousPage or hasNextPage field (whichever is in the backwards direction) in their GraphQL query.

jedwards1211 commented 6 years ago

Okay I am almost done, but I hit one more snag I will need to fix: I will need to revert back to using offset and the old cursors when options.order is Sequelize.literal or function, or there is one in orderBy, as when the tests do the following. There's no reasonable way to use $gt/$lt to window the query for this:

ORDER BY
  CASE
    WHEN completed = true THEN "createdAt"
    ELSE "otherDate" End DESC, "task"."id" ASC LIMIT 10
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.