hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.03k stars 2.76k forks source link

Performance improvements for postgres query with a large array parameter #10463

Open SlavaChristin opened 1 month ago

SlavaChristin commented 1 month ago

We have a performance issue with a query that runs against large table

query GetSecurities($isin: [String!]) {
    debtsecurities(
        where: {
            ISIN: {_in: $isin}
        }
    )
    {
        CUSIP
        ISIN
        Ticker
    }
}

All database indexes are in place and data is in shared memory. The database we use is postgres 14.x and we use hasura/graphql-engine:v2.41.0.cli-migrations-v2 docker image

The performance issue appears when $isin parameter has 10k+ or more items

SELECT
  coalesce(json_agg("root"), '[]') AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_e"
          FROM
            (
              SELECT
                "_root.base"."CUSIP" AS "CUSIP",
                "_root.base"."ISIN" AS "ISIN",
                "_root.base"."Ticker" AS "Ticker"
            ) AS "_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."Securities"
        WHERE
             (
                ("public"."Securities"."ISIN") = ANY(
                  (
                    '{"isin1",...,"isin10K+"}'
                  ) :: varchar []
                )
              )
      ) AS "_root.base"
  ) AS "_root"

so the query runs for 2+ minutes

if the code inside ANY is replaced from array to a list of values like in the query below the results comes in subsecond

SELECT
  coalesce(json_agg("root"), '[]') AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_e"
          FROM
            (
              SELECT
                "_root.base"."CUSIP" AS "CUSIP",
                "_root.base"."ISIN" AS "ISIN",
                "_root.base"."Ticker" AS "Ticker"
            ) AS "_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."Securities"
        WHERE
             (
                ("public"."Securities"."ISIN") = ANY(
                  (
                    VALUES('isin1'), ('...'), ('isin10K')
                )
              )
      ) AS "_root.base"
  ) AS "_root"

The solution is described in the article here https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/

It would be cool to incorporate this into harusa connector

soupi commented 1 month ago

Thank you for the report. We intend on to benchmark these two approaches together with the new approach we use in ndc-postgres and get a better understanding on how to improve the situation for v3.

For v2, we don't plan to prioritize this at the moment, however it is likely that this problem can be mitigated using Native Queries if necessary.