graphile / crystal

🔮 Graphile's Crystal Monorepo; home to Grafast, PostGraphile, pg-introspection, pg-sql2 and much more!
https://graphile.org/
Other
12.62k stars 571 forks source link

Grafast does not reuse operation plan when using sorting enums #2226

Closed clayton-sayer closed 2 weeks ago

clayton-sayer commented 2 weeks ago

Summary

Reopening a previous ticket that was closed, as we don't have the ability to reopen.

We are finding an inordinate amount of time is spent in grafastPrepare and establishOperationPlan, 50-60% of all CPU and wall time is spent in there. It seems like the query caching is not working correctly, as we only have a couple distinct queries that are called.

image

We previously thought that the query was not being reused when the variables were different but when debugging the server locally we found that even with the same variables we are not reusing the cached plan. Arya and I did some debugging ourselves and found the issue. It seems that the value constraints don't match even for the same query because it's trying to compare two enum arrays for equality which doesn't work, ie: ['OCCURRED_AT_DESC'] === ['OCCURRED_AT_DESC'] is false.

Steps to reproduce

We have a query like the following, which includes an ordering enum generated by Postgraphile:

query FetchActivities($first: Int, $cursor: Cursor, $condition: ActivityCondition, $orderBy: [ActivitiesOrderBy!]) {
  activities(
    first: $first
    after: $cursor
    condition: $condition
    orderBy: $orderBy
  ) {
    edges {
      node {
       .... fields here
      }
    }
    pageInfo {
      hasNextPage
      endCursor
    }
  }
}

with variables similar to:

{
  "orderBy":"OCCURRED_AT_DESC",
  "condition":{
    "accountIds":[
      "account-1",
      "account-2"
    ],
    "endDate":"2024-11-02T06:59:59.999Z"
  },
  "first":50
}

Expected results

When debugging on establishOperationPlan I would expect that the second time we make a call with the same variables we would re-use that operation plan and return it.

Actual results

An operation plan is found in the cache but is not compatible with our current request even though the variables are exactly the same. Specifically, the variable values constraints fail.

When digging further, we see that the orderBy constraint fails validation:

image

The constraint object ends up looking like this, which has the orderBy in an array:

{
    "first": 50,
    "condition": {
        "endDate": "2024-11-02T06:59:59.999Z",
        "accountIds": [
            "account-1",
            "account-2"
        ]
    },
    "orderBy": [
        "OCCURRED_AT_DESC"
    ]
}

and the constraint itself looks like:

  {
    "type": "value",
    "path": [
      "orderBy"
    ],
    "value": [
      "OCCURRED_AT_DESC"
    ]
  }

This occurs because it's trying to do an equality check on two arrays without doing a deep-equals, which will always return false.

Additional context

Arya has a unit test that shows off this problem, so it should be easy to fix.

Possible Solution

Perform a deep-equals on the two arrays, or you may want to skip the value equality check altogether and just do an array length check like you do with the accountIds constraints.

aryascripts commented 2 weeks ago

Thanks @clayton-sayer for bringing this up again! Here's the unit test that describes the problem in the function matchesConstraints:

PR: https://github.com/graphile/crystal/pull/2225

Screenshot 2024-11-01 at 3 04 48 PM

clayton-sayer commented 2 weeks ago

FYI we do notice that we're sending the orderBy as a single field in our GraphQL query, yet the variableValues being processed by Grafast has it as an array. That might be another part of the bug, I'm not sure.

Let me know if you want us to dig further into that part, as that seems quite suspicious.

benjie commented 2 weeks ago

The orderBy is an array, so sending a non-array gets coerced as described in the GraphQL spec.

Great find! I thought we did an array length assertion followed by an equality assertion on each value but clearly that’s not the case! Longer term this should be solved by removing eval of orderby (see #2060), but just fixing this seems like a great hotfix 👍

benjie commented 2 weeks ago

TBH a deep equality check where the value is JSON-like would also be fine. I don’t tend to like deep equity (preferring more explicit checks/handling), but in this case it seems reasonable.

benjie commented 2 weeks ago

Huh... but it does do what I remembered it does: it evaluates the length of the array, and then evals each entry therein, at no point during this bit of code does it evaluate the array directly.

https://github.com/graphile/crystal/blob/ba3e6d7087ea79d1dc3a46816e9ea5643443e831/graphile-build/graphile-build-pg/src/plugins/PgConnectionArgOrderByPlugin.ts#L263-L268

I don't know where your orderBy constraint is coming from.

benjie commented 2 weeks ago

Can confirm again that using the order by enums seems to work as expected. I made this change to matchesConstraints:

export function matchesConstraints(
  constraints: Constraint[],
  object: unknown,
): boolean {
  console.log("Matches constraints?");
  console.dir(constraints);
  // In my testing, verbose loops are still about 20% faster than for...of
  for (let i = 0, l = constraints.length; i < l; i++) {
    const constraint = constraints[i];
    if (!matchesConstraint(constraint, object)) {
      console.log("NO! Fails on", constraint, object);
      return false;
    }
  }
  console.log("YES, matches constraints.");
  return true;
}

Query:

query ($orderBy: [PostsOrderBy!]!){
  allPosts(orderBy:$orderBy) {
    nodes{id body}
  }
}

Variables:

{"orderBy": "BODY_ASC"}

Then I ran the above a few times and got this output on each run:

Matches constraints?
[
  {
    type: 'equal',
    path: [ 'orderBy' ],
    expectedValue: undefined,
    pass: false
  },
  {
    type: 'equal',
    path: [ 'orderBy' ],
    expectedValue: undefined,
    pass: false
  },
  { type: 'length', path: [ 'orderBy' ], expectedLength: 1 },
  { type: 'value', path: [ 'orderBy', 0 ], value: 'BODY_ASC' }
]
YES, matches constraints.
Matches constraints?
[]
YES, matches constraints.
Matches constraints?
[]
YES, matches constraints.

I appreciate you digging into this, but please can you create a reproducible example using https://github.com/benjie/ouch-my-finger/ because it seems to be something specific to your setup that is causing it, or a very specific type of query.

benjie commented 2 weeks ago

Also you haven't mentioned which version of PostGraphile V5 you're running - is it the latest version?

benjie commented 2 weeks ago

Even the condition which references an array column doesn't fail because we only evaluate the length of the array and that the entries in it are not undefined/null:

query(
  $first: Int
  $cursor: Cursor
  $condition: PostCondition
  $orderBy: [PostsOrderBy!]!
) {
  allPosts(
    first: $first
    after: $cursor
    condition: $condition
    orderBy: $orderBy
  ) {
    nodes {
      id
      body
      enums
    }
  }
}
{
  "orderBy": "BODY_ASC",
  "first": 50,
  "condition": { "enums": ["AWAITING", "REJECTED"] }
}

results in:

Matches constraints?
[
  {
    type: 'equal',
    path: [ 'orderBy' ],
    expectedValue: undefined,
    pass: false
  },
  {
    type: 'equal',
    path: [ 'first' ],
    expectedValue: undefined,
    pass: false
  },
  {
    type: 'equal',
    path: [ 'cursor' ],
    expectedValue: undefined,
    pass: true
  },
  {
    type: 'equal',
    path: [ 'orderBy' ],
    expectedValue: undefined,
    pass: false
  },
  { type: 'length', path: [ 'orderBy' ], expectedLength: 1 },
  { type: 'value', path: [ 'orderBy', 0 ], value: 'BODY_ASC' },
  {
    type: 'equal',
    path: [ 'condition' ],
    expectedValue: undefined,
    pass: false
  },
  {
    type: 'equal',
    path: [ 'condition', 'headline' ],
    expectedValue: undefined,
    pass: true
  },
  {
    type: 'equal',
    path: [ 'condition', 'body' ],
    expectedValue: undefined,
    pass: true
  },
  {
    type: 'equal',
    path: [ 'condition', 'enums' ],
    expectedValue: undefined,
    pass: false
  },
  {
    type: 'equal',
    path: [ 'condition', 'enums' ],
    expectedValue: null,
    pass: false
  },
  {
    type: 'equal',
    path: [ 'condition', 'comptypes' ],
    expectedValue: undefined,
    pass: true
  },
  {
    type: 'equal',
    path: [ 'condition', 'personByAuthorId' ],
    expectedValue: undefined,
    pass: true
  },
  {
    type: 'equal',
    path: [ 'condition', 'computedWithOptionalArg' ],
    expectedValue: undefined,
    pass: true
  },
  { type: 'length', path: [ 'condition', 'enums' ], expectedLength: 2 },
  { type: 'value', path: [ 'first' ], value: 50 }
]
YES, matches constraints.
Matches constraints?
[]
YES, matches constraints.
Matches constraints?
[]
YES, matches constraints.
aryascripts commented 2 weeks ago

@benjie yes I agree, that the orderBy is an array in the GraphQL Schema, so it is coerced correctly - I think that's not an issue here that is impacting the outcome. This is our type in the schema:

$orderBy: [ActivitiesOrderBy!]

Whether we pass in an array or not, is unrelated to this issue. GraphQL Spec mentions that non-list values will be coerced to an array anyways. We have the same outcome with both versions of the variables:

"orderBy":"OCCURRED_AT_DESC", or "orderBy": ["OCCURRED_AT_DESC"],

In your example, this is the difference compared to yours:

{
  type: 'value', 
  path: [ 'orderBy', 0 ], 
  value: 'BODY_ASC' // ARRAY in our version
}

Our value ends up being an array instead.

Here's my change in the function:

function matchesConstraints(constraints, object) {
    // In my testing, verbose loops are still about 20% faster than for...of
    for (let i = 0, l = constraints.length; i < l; i++) {
        const constraint = constraints[i];
        if (!matchesConstraint(constraint, object)) {
            console.log("constraint = ", constraint,  "object = ", object);
            return false;
        }
    }
    return true;
}

Which prints this before returning false:

constraint = { type: 'value', path: [ 'orderBy' ], value: [ 'OCCURRED_AT_DESC' ] } 
object = { orderBy: [ 'OCCURRED_AT_DESC' ] }

We can try spending some time to get a small example with our versions and get back to you!

Versions

Here's our versions (same as the last post):

# postgraphile / grafast
"postgraphile": "^5.0.0-beta.25",
"grafast": "^0.1.1-beta.10",
"grafserv": "^0.1.1-beta.12",
"graphile": "^5.0.0-beta.25",
"graphile-build": "^5.0.0-beta.20",
"graphile-build-pg": "^5.0.0-beta.24",
"graphile-config": "^0.0.1-beta.8",
"graphile-migrate": "^1.4.1",
"graphile-utils": "^5.0.0-beta.24",
"@graphile-contrib/pg-omit-archived": "^4.0.0-beta.3",
"@graphile-contrib/pg-simplify-inflector": "^6.1.0",
"@graphile/simplify-inflection": "^8.0.0-beta.5",

# pg
"pg-sql2": "^5.0.0-beta.6",
"pg": "^8.12.0",
aryascripts commented 2 weeks ago

Note: just came to the attention that we are a couple of beta versions behind, and one of the fixes is how orderBy is evaluated. We are upgrading and will post back!

benjie commented 2 weeks ago

In your example, this is the difference compared to yours:

Actually it's not just the value that has changed, but also the path - note that the index 0 is added to the path, so we're looking at the value of the array entry, not the array itself:

{
  type: 'value', 
  path: [ 'orderBy', 0 ], 
  value: 'BODY_ASC'
}

Note: just came to the attention that we are a couple of beta versions behind, and one of the fixes is how orderBy is evaluated. We are upgrading and will post back!

Hopefully updating fixes the issue! Indeed it looks like this may have been addressed in July:

This fix aligns with the description of the issue you're having and the output you're seeing, so I'm going to go ahead and close this as solved. With beta software it's generally best to update to the latest before filing any issues :+1:

clayton-sayer commented 2 weeks ago

Thanks Benjie. We can confirm that updating to the latest version significantly improved query caching re-use and reduced CPU usage on the service. I thought that our dependency updater bot would have kept those up to date but it seems like they were missed, that's my bad for not checking that assumption.

Thank you for the help!