mickhansen / graphql-sequelize

GraphQL & Relay for MySQL & Postgres via Sequelize
MIT License
1.9k stars 172 forks source link

Query results are duplicated when identical queries with a relationship model & limit argument are batched #541

Closed marklawlor closed 5 years ago

marklawlor commented 6 years ago

Graphql-Sequelize is duplicating query results from identical queries when the query has both a where and limit argument.

A "real world" situation where this issue occurs is if you are using GraphQL subscriptions. If two or more people are viewing the same page of an application, it is very likely they both have identical subscription queries.

When a subscription event fires, all registered subscriptions will both be resolved and batched together, causing the issue to be fairly reproducible.

The number of results returned will be duplicated N times, where N is the number of registered clients for the subscription

We have also seen the issue occur with client-side query generators (eg Apollo Client) that batch requests together when a component uses an alias

Example query

query {
  users {
    id
    tasks (limit: 1) {
      id
    }
    test: tasks (limit: 1) {
      id
    }
  }
}

See #542 for a failing test

janmeier commented 6 years ago

This is not something we have seen in our usage of the library and I do believe we have quite a few places where we have duplicated fields. Will have to look some more into this

Which sequelize version are you using?

marklawlor commented 6 years ago

The code example in my first comment was generated with the latest version of all libraries

mickhansen commented 6 years ago

If you can PR a failing test i'd love to take a look @marklawlor.

janmeier commented 6 years ago

Latest being sequelize 4? We only test against v3, and at snaplytics we use v3 as well

marklawlor commented 6 years ago

Tested both Sequelize 3 and 4.

PR #542 is up with a breaking test

marklawlor commented 6 years ago

I updated the PR with a second test which is similar, but passing.

Maybe the issue only occurs with Relationships + Limits?

mickhansen commented 6 years ago

@marklawlor Well the test you added that is passing is the one using relationships? So it would stand to reason that it's primarily a issue on root queries?

Most users don't have root relationships, usually people use viewer or something similar

marklawlor commented 6 years ago

The failure is occurring in issue#541 - Relationships which uses resolver(User.Tasks). So I believe it may only affect relationships.

Sorry for the confusion. I committed the Singular test as I realised that originally reported issue (where + limit) was slightly incorrect

mickhansen commented 6 years ago

@marklawlor that doesn't really make sense as that's the test you added last, you said the second test was passing.

Please limit the PR to the failing test.

mickhansen commented 6 years ago

Okay obviously i'm bad at reading commits, sorry about that @marklawlor, it does appear related to associations yes, which makes sense.

Sounds like it might be related to dataloader

marklawlor commented 6 years ago

Yep, it was fairly easy to replicate with just dataloader now that I understand it only occurs with associations

https://github.com/mickhansen/dataloader-sequelize/issues/44

intellix commented 6 years ago

Just noticed this one as well. For another use-case. I'm adding default limits to all connections to make sure someone can't request a million rows if they don't use a pagination argument, like so:

export function defaultLimit(findOptions, args, context) {
  if (!findOptions.limit) {
    findOptions.limit = 250;
  }
  return findOptions;
}

export const UserWalletConnectionType = createConnection({
  name: 'UserWallet',
  nodeType: WalletType,
  target: () => User.associations.wallets || Wallet,
  before: defaultLimit,
});

Originally this was giving me issues cause I force a limit. If I remove the defaultLimit on connections it can still be replicated with connection args:

{
  messages(first: 3) {
    edges {
      node {
        body
        user {
          name
          wallets(first: 2) {
            edges {
              node {
                name
              }
            }
          }
        }
      }
    }
  }
}

This gives me 6 wallets instead of 2 and a query like so:

SELECT "Wallet".* FROM (
    (SELECT "id", "name" FROM "Wallet" AS "Wallet" WHERE "Wallet"."userId" = 1 ORDER BY "Wallet"."id" ASC LIMIT 2) UNION ALL 
    (SELECT "id", "name" FROM "Wallet" AS "Wallet" WHERE "Wallet"."userId" = 1 ORDER BY "Wallet"."id" ASC LIMIT 2) UNION ALL 
    (SELECT "id", "name" FROM "Wallet" AS "Wallet" WHERE "Wallet"."userId" = 1 ORDER BY "Wallet"."id" ASC LIMIT 2)) AS "Wallet";

Running the above with 250 messages (my default limit) returns a response of 3mb and I'm scared of abuse in production :|

Any idea of where to look for fixing this in dataloader?

mickhansen commented 6 years ago

Somehow the cache key calculated must be differing, although i'm not quite sure how that's possible.

mickhansen commented 6 years ago

@intellix can you verify whether or not this happens for just "old-style" dataloader or also context based dataloader?

intellix commented 6 years ago

@mickhansen I believe it's the new context-based dataloader. I'm creating 1x global one at the top of my schema file after all of my Sequelize models/associations etc. Here's an example of my large schema.ts file:

import { Achievement } from './sequelize/achievement.model';
import { AchievementType } from './types/achievement.type';

typeMapper.mapType((type: DataTypeAbstract) => {
  // ISO Dates for all Sequelize.Date fields
  if (type instanceof Sequelize.DATE) {
    return SequelizeDateType;
  }
  if (type instanceof Sequelize.DECIMAL) {
    return DecimalType;
  }
  if (type instanceof Sequelize.BIGINT) {
    return GraphQLString;
  }
  return false;
});

// Globally enable dataloader
const dataloaderContext = createContext(sequelize);
resolver.contextToOptions = { [EXPECTED_OPTIONS_KEY]: dataloaderContext };

// Map Node GraphQL Types to Sequelize Models
nodeTypeMapper.mapTypes({
  [Achievement.name]: AchievementType,
});

const Query = new GraphQLObjectType({
  name: 'Query',
  fields: () => ({ ... }),
});

export const schema = new GraphQLSchema({ query: Query });
mickhansen commented 6 years ago

@intellix The easiest fix is to apply uniq(keys) to the places in data loader where we used groupedLimit. I tried to do just that but a bunch of tests are failing, possibly unrelated but i don't currently have time to go through it all.