graphile / graphile-engine

Monorepo home of graphile-build, graphile-build-pg, graphile-utils, postgraphile-core and graphql-parse-resolve-info. Build a high-performance easily-extensible GraphQL schema by combining plugins!
https://www.graphile.org/
762 stars 129 forks source link

Pagination cursors not working for tables with at least one bigint component #811

Closed dmrenie closed 1 year ago

dmrenie commented 2 years ago

Summary

Cursors encoding primary keys with at least one bigint component suffer from a rounding error presumably from encoding the bigint as a Javascript number as opposed to a string.

Steps to reproduce

Create a table with a bigint primary key and values which cannot be represented precisely with a Javascript number (5673028755079817001 is rounded down to 5673028755079817000):

createdb bigint_bug
echo "CREATE TABLE test (id bigint PRIMARY KEY);
INSERT INTO test (id) VALUES (5673028755079817001);
INSERT INTO test (id) VALUES (5673028755079817002);
INSERT INTO test (id) VALUES (5673028755079817003);" | psql -d bigint_bug

Run postgraphile and query via the GraphiQL interface: npx postgraphile -c bigint_bug --enhance-graphiql

The request for the second page of results returns the same result as the initial GraphQL query:

query MyQuery {
  allTests(first: 1) {
    pageInfo {
      endCursor
      hasNextPage
    }
    nodes {
      id
    }
  }
}

returns

{
  "data": {
    "allTests": {
      "pageInfo": {
        "endCursor": "WyJwcmltYXJ5X2tleV9hc2MiLFs1NjczMDI4NzU1MDc5ODE3MDAwXV0=",
        "hasNextPage": true
      },
      "nodes": [
        {
          "id": "5673028755079817001"
        }
      ]
    }
  }
}

And then using the endCursor:

query MyQuery {
  allTests(
    first: 1
    after: "WyJwcmltYXJ5X2tleV9hc2MiLFs1NjczMDI4NzU1MDc5ODE3MDAwXV0="
  ) {
    pageInfo {
      endCursor
      hasNextPage
    }
    nodes {
      id
    }
  }
}

returns the same result:

{
  "data": {
    "allTests": {
      "pageInfo": {
        "endCursor": "WyJwcmltYXJ5X2tleV9hc2MiLFs1NjczMDI4NzU1MDc5ODE3MDAwXV0=",
        "hasNextPage": true
      },
      "nodes": [
        {
          "id": "5673028755079817001"
        }
      ]
    }
  }
}

Expected results

The next page of results with the id 5673028755079817002.

Actual results

The same page of results with the id 5673028755079817001.

Additional context

N/A

Possible Solution

I suspect that representation of primary keys in cursors encodes a postgres bigint outside the precisely representable range of a Javascript number as a Javascript number as opposed to a string like in fields. If so, it seems a solution could use logic similar to https://cs.github.com/graphile/graphile-engine/blob/3daa9cdae93d642b4cc72c8e80a534850c13d00b/packages/graphile-build-pg/src/plugins/PgTablesPlugin.js#L141

benjie commented 2 years ago

The actual value of cursors is not super important (they can change at will), so I don't think we need anything as subtle as the workaround you linked. I think we might be able to do it by changing this line:

https://github.com/graphile/graphile-engine/blob/1619878d964b222332c8d4092bd6a654c1ec6595/packages/graphile-build-pg/src/queryFromResolveDataFactory.js#L323

to:

            .map(([expr]) => sql.fragment`(${expr})::text`);

And making the equivalent change here:

https://github.com/graphile/graphile-engine/blob/1619878d964b222332c8d4092bd6a654c1ec6595/packages/graphile-build-pg/src/queryFromResolveDataFactory.js#L377

to cast it back again.

However this would change the cursors used throughout the test suite, which would then require each of those to be updated. Doing this only for bigint columns would be a greater saving, but I'm not sure how much effort that would be.

If you're interested in taking this on, then please do the following, in order:

  1. Find a suitable target in, or add a suitable example to, https://github.com/graphile/graphile-engine/blob/v4/packages/postgraphile-core/__tests__/kitchen-sink-schema.sql and https://github.com/graphile/graphile-engine/blob/v4/packages/postgraphile-core/__tests__/kitchen-sink-data.sql where ordering in bigint would cause this issue
  2. Add a test for the above
    1. create a .test.graphql file containing an affected operation here: https://github.com/graphile/graphile-engine/tree/v4/packages/postgraphile-core/__tests__/queries/base
    2. Add an assertion at the top of your test file (e.g. ## expect(data.myConnection.nodes[0].id).toEqual(3)) to state what should be there
    3. Run the tests, all the related files should be generated automatically
    4. Your assertion should fail (don't worry that the snapshots are wrong, your assertion is what matters here)
  3. Make the change
  4. Update the cursors across the test suite to respect the new format
  5. Run the tests again, with UPDATE_SNAPSHOTS=1 yarn test in the packages/postgraphile-core folder
  6. Assert that the tests pass

Even if you only get as far as completing step 2, raising a PR with that would help me or someone else fix the issue faster since reproduction is often half the battle.

gitrojones commented 1 year ago

Ran into this myself. I can confirm your proposed change does fix the problem but then everything is being handled with a text cast so that might not be ideal (The cast back didn't seem necessary but I only looked at this superficially). Either way, bigints should just be treated as strings. Truncated integers are always going to cause weird bugs like this which are a huge pain to track down.

Ultimately I'm wondering why the bigint is being cast to an integer in the first place. node-postgres should be treating bigints as strings by default so I'd expect graphile to do the same. I'm assuming they're not because of optimizations like this https://cs.github.com/graphile/graphile-engine/blob/3daa9cdae93d642b4cc72c8e80a534850c13d00b/packages/graphile-build-pg/src/plugins/PgTablesPlugin.js#L141?

benjie commented 1 year ago

@gitrojones That's not an optimization, it's a bugfix. But before I can explain that, I need to explain the underlying issue:

This all comes down to the fact that Postgres doesn't encode bigint as a string in JSON, but instead as a number:

[test] # select row_to_json(t) from (values (9007199254740991::bigint, 9007199254740992::bigint, 9007199254740993::bigint)) as t (one, two, three);
┌──────────────────────────────────────────────────────────────────────────┐
│                               row_to_json                                │
├──────────────────────────────────────────────────────────────────────────┤
│ {"one":9007199254740991,"two":9007199254740992,"three":9007199254740993} │
└──────────────────────────────────────────────────────────────────────────┘
(1 row)

This is perfectly valid, as you can see from the JSON spec: https://www.json.org/json-en.html#:~:text=A%20number%20is%20very%20much%20like%20a%20C%20or%20Java%20number%2C%20except%20that%20the%20octal%20and%20hexadecimal%20formats%20are%20not%20used.

The JSON spec doesn't place any limits on the size of a number, doesn't state that it must be IEEE754 compatible, or anything like that. So Postgres encodes a bigint (which is a number) as a number.

However, JS (and thus Node) cannot handle numbers that big, so when you JSON.parse the JSON string (which is what node-postgres does, since PostGraphile builds JSON objects in Postgres, not raw row types), you get an ambiguous number. And it doesn't even throw an error! Naughty node - no cookies for you!

> JSON.parse('{"one":9007199254740991,"two":9007199254740992,"three":9007199254740993}')
{
  one: 9007199254740991,
  two: 9007199254740992,
  three: 9007199254740992  <<< WRONG!
}

So back to the bugfix - it was a long while before this issue with bigint was discovered (v1 through v4.4.2 all had this issue), and so to push a fix out that didn't break existing codebases we had to make it so that NodeIDs for safe integers (those up to and including 9007199254740991) were unchanged, but those above that point were potentially invalid already and thus they could use the string encoding that we would have like to use everywhere without being a breaking change (since it was already broken). I like to think of these kinds of issues as "breaking fixes."


My proposed solution in the comment above still stands, but I'm busy in the depths of V5 development (which doesn't have this issue) so if it's something you want to see fixed, you're going to have to implement it following the guidance above.

benjie commented 1 year ago

(And to be clear, yes I know that Node has BigInt now so it can handle numbers that big, just not through a vanilla JSON.parse.)

gitrojones commented 1 year ago

Ah that makes a lot more sense. Appreciate the insight. I'll circle back to this with a PR when I have a bit of extra time.

benjie commented 1 year ago

Can confirm this is definitely fixed in V5