mickhansen / graphql-sequelize

GraphQL & Relay for MySQL & Postgres via Sequelize
MIT License
1.9k stars 172 forks source link

Use windowed queries instead of OFFSET for connections when possible (and without breaking hasNextPage/hasPreviousPage) #607

Closed jedwards1211 closed 4 years ago

jedwards1211 commented 6 years ago

Great success!

Details

Breaking Changes

jedwards1211 commented 6 years ago

@mickhansen I checked the build failures, it was more of those pesky Postgres race conditions

jedwards1211 commented 6 years ago

@janmeier I realized last night it would be possible to preserve the signature of resolveEdge. However, rebuilding the list of orderAttributes from the args for each edge would be wasteful. However I could use reselect's createSelector to avoid recomputing orderAttributes when the args are the same as the previous call. Do you think that would be worth avoiding a breaking change?

jedwards1211 commented 6 years ago

Also glad you got the Borat reference 😄

codecov-io commented 6 years ago

Codecov Report

Merging #607 into master will increase coverage by 3.55%. The diff coverage is 99.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #607      +/-   ##
==========================================
+ Coverage   91.48%   95.03%   +3.55%     
==========================================
  Files          13       12       -1     
  Lines         411      423      +12     
==========================================
+ Hits          376      402      +26     
+ Misses         35       21      -14
Impacted Files Coverage Δ
src/relay.js 98.79% <99.09%> (+1.69%) :arrow_up:
src/typeMapper.js 97.05% <0%> (-0.24%) :arrow_down:
src/types/dateType.js
src/replaceWhereOperators.js 100% <0%> (+6.66%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6420da3...8d22374. Read the comment docs.

janmeier commented 6 years ago

Do you think that would be worth avoiding a breaking change?

The signature for resolveEdge is not formalized anywhere, so I wouldn't worry too much about it (still a major version change, but I don't think it will affect many people). I just checked through our codebase, and we only use resolveEgde(row) in mutations to resolve a newly added edge, so the change wouldn't affect us. I can't really think of other cases where you want to use resolveEdge in your own application?

jedwards1211 commented 6 years ago

@janmeier I can't think of other cases either, but I hadn't looked into how it was being used in the mutations in detail (though I did have to fix that test). By "your codebase", do you mean in this project, or are you also using it in production projects at Snaplytics?

jedwards1211 commented 6 years ago

We might want to accomplish this using window functions for some databases: https://use-the-index-luke.com/sql/partial-results/window-functions

jedwards1211 commented 6 years ago

transient postgres failures again

janmeier commented 6 years ago

By "your codebase", do you mean in this project, or are you also using it in production projects at Snaplytics?

Production at snaplytics :) (https://graphql-7.snaplytics.io/explorer)

jedwards1211 commented 6 years ago

Okay @mickhansen I made the ordering change we were discussing in #590, fixed the tests, and added a test that pages forward twice and then backward twice: https://github.com/mickhansen/graphql-sequelize/pull/607/files#diff-f784af1a4e8d657979912a0b92ef61b9R1016

New Behavior

query: tasks(first: 3, orderBy: [LATEST])
result edges:
  T-5s
  T-10s
  T-15s
startCursor: T-5s
endCursor: T-15s

query: tasks(first: 3, after: T-15s, orderBy: [LATEST])
result edges:
  T-20s
  T-25s
  T-30s
startCursor: T-20s
endCursor: T-30s

query: tasks(first: 3, after: T-30s, orderBy: [LATEST])
result edges:
  T-35s
  T-40s
  T-45s
startCursor: T-35s
endCursor: T-45s

query: tasks(last: 3, before: T-35s, orderBy: [LATEST])
result edges:
  T-20s
  T-25s
  T-30s
startCursor: T-20s
endCursor: T-30s

query: tasks(last: 3, before: T-20s, orderBy: [LATEST])
result edges:
  T-5s
  T-10s
  T-15s
startCursor: T-5s
endCursor: T-15s

Current Behavior

query: tasks(first: 3, orderBy: [LATEST])
result edges:
  T-5s
  T-10s
  T-15s
startCursor: T-5s
endCursor: T-15s

query: tasks(first: 3, after: T-15s, orderBy: [LATEST])
result edges:
  T-20s
  T-25s
  T-30s
startCursor: T-20s
endCursor: T-30s

query: tasks(first: 3, after: T-30s, orderBy: [LATEST])
result edges:
  T-35s
  T-40s
  T-45s
startCursor: T-35s
endCursor: T-45s

query: tasks(last: 3, before: T-35s, orderBy: [LATEST])
result edges: (BACKWARDS)
  T-30s
  T-25s
  T-20s
startCursor: T-30s
endCursor: T-20s (BACKWARDS)

query: tasks(last: 3, before: T-20s, orderBy: [LATEST])
result edges: (BACKWARDS)
  T-15s
  T-10s
  T-5s
startCursor: T-15s
endCursor: T-5s (BACKWARDS)
mickhansen commented 6 years ago

Windowed queries are used except when at least one orderBy attribute is not a string (i.e. custom function or Sequelize.literal)

This worries me a little bit as this could mean subtle bugs. Aren't window queries generally able to perform the same logic as an order by?

mickhansen commented 6 years ago

For first queries, hasNextPage is determined by using limit: first + 1 and seeing if the result nodes.length > first. If hasPreviousPage is selected in the GraphQL, it will perform a second query in the opposite direction with limit: 1 to determine the answer hasPreviousPage.

This is to counter the result set changing i suppose? Because otherwise the result of the window toatl query - current first/last argument would show if there are more pages.

jedwards1211 commented 6 years ago

@mickhansen I'm still learning the vocabulary about this because I was wrong to call these "window functions". One term I've heard for what this PR does is "keyset pagination". Using window functions could also be helpful for connections but as they're more exotic and less consistent among SQLs I've avoided them here.

Regarding falling back to offset: It seems like CASE clauses work in ORDER BY but not in place of an operand in an inequality If there were a bug caused by falling back to offset here, that same bug would already occur with the current code

What do you mean by "window total query"? I fetch in the opposite direction because, given first: 3, after: <id: 2>, the first SQL query is WHERE id > 2 ORDER BY id ASC LIMIT 4 and the results look like:

0
1 ? is there a way to select this element in the same query?
3 <-- startCursor
4 <
6 <-- endCursor
9 <-- included in SQL result but not the GraphQL result; indicates there's another page 
10
12

I can't think of a way to include WHERE id <= 2 ORDER BY id DESC LIMIT 1 (which we need to determine if the previous page exists) in the same query unless we use a UNION ALL.

mickhansen commented 6 years ago

Ah yeah i didn't look at the code closely yet just your description, if you aren't actually using WINDOW ignore what i said about the window total (windows in postgres lets you get the entire row count even if you have defined offset or limit)

mickhansen commented 6 years ago

Do you need to include the previous element? If there is a cursor there must be a previous page, atleast from the thought that you can only get cursors onces you don't the first query.

mickhansen commented 6 years ago

@jedwards1211 I'll review this more thoroughly tomorrow. I'd still like to see the ordering change in another PR so that can be discussed on its own with all interested parties and contributors :)

jedwards1211 commented 6 years ago

If there is a cursor there must be a previous page, atleast from the thought that you can only get cursors onces you don't the first query.

Often that will be the case, but keep in mind that all elements <= a cursor could get updated or deleted after that cursor is created. That's why we have to make a query if we want to get the correct answer for hasPreviousPage

I've tried to think about this very carefully, paging is not as simple as it sounds :)

mickhansen commented 6 years ago

Often that will be the case, but keep in mind that all elements <= a cursor could get updated or deleted after that cursor is created. That's why we have to make a query if we want to get the correct answer

True enough, i'd always careful introducing something that decreases performance though. At the very least we need to figure out how to perform that query in parallel with the "real" one (if you haven't already implemented that)

jedwards1211 commented 6 years ago

@mickhansen ah that's true, I should do it in parallel, I'll make that change when I get a chance. I already avoid doing the second query if the relevant hasNextPage/hasPreviousPage isn't selected in the graphql query, but I should also add an option to turn off the extra query regardless (since the spec allows us to return false for performance reasons). Or would you rather disable the second query by default and only turn it on if the programmer explicitly asks to do so?

jedwards1211 commented 6 years ago

Btw it would be cool to implement pagination using window functions in databases that support it, though according to this article it doesn't perform efficiently in Postgres. Apparently it's a valid approach with MSSQL but right now we're not testing against MSSQL so it would be quite a bit of work to implement and test that.

jedwards1211 commented 6 years ago

@mickhansen

True enough, i'd always careful introducing something that decreases performance though.

Yeah but compared to the performance of an OFFSET query?

jedwards1211 commented 6 years ago

@mickhansen Okay, this now performs the secondary hasNextPage/hasPreviousPage query in parallel, and only if those fields are requested. In addition I merged in changes from master. Is there anything else you need or can we merge this?

mickhansen commented 5 years ago

@jedwards1211 It looks solid, i need to do a full review and been kinda swamped. I'll try and set time aside to go through it with @janmeier tomorrow at our offices :)

mickhansen commented 5 years ago

The tests were using before to replace order contents with a Sequelize.literal. This will not work anymore; instead users can put their custom functions or literals directly in the values for the corresponding orderBy enum.

This specifically breaks modifying options.order in the before hook right? That's likely not ideal, we could check that options.order was modified in the hook and if it was revert to offset based logic (like you do with your string check).

jedwards1211 commented 5 years ago

@mickhansen that's a good point, I'll implement that. Last night I made it handle queries like

{
  things(first: 1000, after: "lkasdiniweraskdjf") {
    pageInfo {
      endCursor
      hasNextPage
    }
  }
}

by using OFFSET 999 LIMIT 2 (in conjunction with the non-offset based WHERE condition from after) so that it fetches the minimum information necessary to fulfill that query. I realized people may want to do a query like this if their UI supports scrolling way ahead, and they want to skip over everything the user scrolled past and fetch only the rows they scrolled to.

jedwards1211 commented 5 years ago

@mickhansen actually I think we should just throw an error saying "modifying order isn't supported" if they change the order in before and the cursor they passed in is not in offset form. We wouldn't be able to return a proper cursor; an attribute-based cursor might be wrong, but we wouldn't know the starting offset to add for returning an offset-based cursor. Are there any cases where you think modifying the order in before is really necessary?

jedwards1211 commented 5 years ago

@mickhansen come to think of it, I'd like to add more logic to throw errors if they passed an invalid cursor.

jedwards1211 commented 5 years ago

@mickhansen I just realized there's currently a mistake: I'm passing after to the nodes resolver as well as calling it with the entire resolved connection.

I think I should accept a separate afterNodes argument. Which makes me wonder if we should change before to beforeNodes since it's run before the nodes resolver, which may be called twice, rather than before the entire operation.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.