mickhansen / graphql-sequelize

GraphQL & Relay for MySQL & Postgres via Sequelize
MIT License
1.9k stars 172 forks source link

Connection queries SELECT full_count slow on large tables. #521

Closed borderBite closed 4 years ago

borderBite commented 7 years ago

Queries generated for Connections count the total number of matching records. Here's an example:

SELECT [ID] AS [id], COUNT(1) OVER() AS [full_count] 
FROM [dbo].[projects] AS [projects] 
ORDER BY [projects].[id] ASC OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

For large tables, this can take a VERY long time. I'm using MSSQL but this may also apply to other dbs.

I created connection query using sequelizeConnection().

const ProjectsConnection = sequelizeConnection({
  name: 'Project',
  nodeType: ProjectType,
  target: Project,
});

const ProjectsConnectionQuery = {
  type: ProjectsConnection.connectionType,
  args: ProjectsConnection.connectionArgs,
  resolve: ProjectsConnection.resolve,
};

I'm guessing full_count is used to compute hasNextPage and hasPreviousPage in pageInfo. Is this the case? If full_count isn't used directly to return the total page count or total record count there must be a more efficient solution.

Alternatively, we could expand the query window to include one record before the window or one record after the window. If the extra record is returned, we know there is another page.

mickhansen commented 7 years ago

Alternatively, we could expand the query window to include one record before the window or one record after the window. If the extra record is returned, we know there is another page.

Good idea, had a similar thought.

ryanmcclure4 commented 4 years ago

I'm currently running into serious performance issues because of this. On some of my connections with tables that contain very large datasets, the COUNT(*) OVER() AS "full_count" makes the query extremely slow. I see this issue was created a couple years ago, but I'm using the latest version of the package and it still seems to be using the same method. Is there any alternative or workaround at the moment?

mickhansen commented 4 years ago

@ryanmcclure4 That's odd, we have COUNT() OVER() queries running on tables containing hundres of millions of rows with little issue, but our where statements are generally pretty specific.

mickhansen commented 4 years ago

No alternative or workaround at the moment, need to implement the idea that @nslocum had

Sicria commented 4 years ago

Also had issues with this and decided to disable that part of the query via the before. It's for sure hacky but removes the problem and has reduced my queries from ~14000ms to ~200ms.

(findOptions, args, context) => ({
  ...findOptions,
  attributes: [
    ...(findOptions.attributes || []).filter(x => !(x.length === 2 && x[1] === 'full_count')),
    [context.sequelize.literal(0), 'full_count'],
  ],
}),
intellix commented 4 years ago

Getting a list of the last 50 users (out of 4 million) created in a table and it's taking quite a bit of time. I put it into pgMustard to see what it had to say about my query and it pretty much pointed to this same issue.

The DB has about 40gb RAM and it says:

Screenshot 2020-03-02 at 18 27 24

Will try the above workaround :)

mickhansen commented 4 years ago

@intellix It's odd, i've used the exact same library on a row that had close to a billion rows.

But yeah will probably need to refactor it

intellix commented 4 years ago

I've created a little patch to achieve this. I'm not 100% but the results look good to me. I'm essentially adding +1 to the limit passed in and then slicing away the last result and using the length as a count. It's not the full_count but I have those in separate resolvers anyway

Screenshot 2020-03-28 at 00 00 38 Screenshot 2020-03-28 at 00 00 43

graphql-sequelize+9.3.5.patch

diff --git a/node_modules/graphql-sequelize/lib/relay.js b/node_modules/graphql-sequelize/lib/relay.js
index 85ff592..88302a0 100644
--- a/node_modules/graphql-sequelize/lib/relay.js
+++ b/node_modules/graphql-sequelize/lib/relay.js
@@ -264,11 +264,8 @@ function createConnectionResolver(_ref2) {
       }

       if (options.limit && !options.attributes.some(attribute => attribute.length === 2 && attribute[1] === 'full_count')) {
-        if (model.sequelize.dialect.name === 'postgres') {
-          options.attributes.push([model.sequelize.literal('COUNT(*) OVER()'), 'full_count']);
-        } else if (model.sequelize.dialect.name === 'mssql' || model.sequelize.dialect.name === 'sqlite') {
-          options.attributes.push([model.sequelize.literal('COUNT(1) OVER()'), 'full_count']);
-        }
+        // Fetch an extra to check for more pages
+        options.limit += 1;
       }

       options.where = argsToWhere(args);
@@ -300,9 +297,14 @@ function createConnectionResolver(_ref2) {
           return resolveEdge(value, idx, cursor, args, source);
         });

+        let fullCount = values.length;
+
+        if (fullCount > args.first || fullCount > args.last) {
+          edges = edges.slice(0, -1);
+        }
+
         let firstEdge = edges[0];
         let lastEdge = edges[edges.length - 1];
-        let fullCount = values[0] && (values[0].dataValues || values[0]).full_count && parseInt((values[0].dataValues || values[0]).full_count, 10);

         if (!values[0]) {
           fullCount = 0;
@@ -336,7 +338,7 @@ function createConnectionResolver(_ref2) {
             index = 0;
           }

-          hasNextPage = index + 1 + count <= fullCount;
+          hasNextPage = count < fullCount;
           hasPreviousPage = index - count >= 0;

           if (args.last) {
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

intellix commented 4 years ago

I guess this one should be re-opened. We're ok with the patch but other people are gonna get stung :)