Extremely slow subscriptions (seq scan instead of index)

switz commented 4 years ago

subscription liveMatches {
  matches(limit: 5, where: {status: {_eq: 3}}) {
    id
  }
}

I'm seeing responses on this query of anywhere from 1000ms to 3500ms, whereas the simpler version below returns in < 20ms. The table has an index on status. The table has a little over 1,000,000 rows, but this query has at most 5-100 nodes. I've noticed if the limit is above the total number of total nodes, it gets particularly slow. If the limit is beneath the number of nodes it's quite fast. I'm not sure if that relevance, but it may be helpful data.

Nested Loop Left Join  (cost=0.56..0.60 rows=1 width=48)
  ->  Function Scan on _subs  (cost=0.01..0.01 rows=1 width=48)
  ->  Subquery Scan on matches  (cost=0.55..0.58 rows=1 width=32)
        ->  Aggregate  (cost=0.55..0.56 rows=1 width=32)
              ->  Limit  (cost=0.00..0.49 rows=5 width=32)
                    ->  Seq Scan on matches matches_1  (cost=0.00..53376.58 rows=545002 width=32)
                          Filter: (status = ((_subs.result_vars #>> '{synthetic,0}'::text[]))::integer)
                          SubPlan 1
                            ->  Result  (cost=0.00..0.01 rows=1 width=32)

It appears as though it's using a seq scan for what should just be a simple index scan:

You can see once this query starts, it alone hammers my database:

I sent this out on Discord but didn't get a reply after 24h so I figured I'd share it here. Thanks.

switz commented 4 years ago

I should add that this only arises on subscriptions, not queries. The query hits the index.

sasog23 commented 4 years ago

i'm noticing similar performance issue with subscription table contacts has 176391 records, while the table events has 415440 records my subscription is to a table event with a couple of fields; contact is relation to contacts

subscription EventsSubscription{ test_events( limit: 15, offset: 0, where: { status_group: {_eq: 10} } order_by: { updated_at: desc, id: desc } ) { id short_desc updated_at contact { name } } }

Query Aggregate (cost=1339.25..1339.26 rows=1 width=32) -> Limit (cost=1339.03..1339.06 rows=15 width=48) -> Sort (cost=1339.03..1340.30 rows=508 width=48) Sort Key: events.updated_at DESC, events.id DESC -> Nested Loop Left Join (cost=0.84..1326.56 rows=508 width=48) -> Index Scan using _idx_1 on events (cost=0.42..101.61 rows=508 width=62) Index Cond: (status_group = 10) -> Index Scan using contacts_pkey on contacts (cost=0.42..2.40 rows=1 width=40) Index Cond: (id = events.contact_id) SubPlan 1 -> Result (cost=0.00..0.01 rows=1 width=32) SubPlan 2 -> Result (cost=0.00..0.01 rows=1 width=32)

Subscription Nested Loop Left Join (cost=66805.25..66805.29 rows=1 width=48) -> Function Scan on _subs (cost=0.01..0.01 rows=1 width=48) -> Subquery Scan on test_events (cost=66805.24..66805.26 rows=1 width=32) -> Aggregate (cost=66805.24..66805.25 rows=1 width=32) -> Limit (cost=66805.01..66805.05 rows=15 width=48) -> Sort (cost=66805.01..67322.90 rows=207154 width=48) Sort Key: events.updated_at DESC, events.id DESC -> Hash Left Join (cost=11851.12..61722.60 rows=207154 width=48) Hash Cond: (events.contact_id = contacts.id) -> Index Scan using _idx_1 on events (cost=0.43..40907.70 rows=207154 width=62) Index Cond: (status_group = ((_subs.result_vars #>> '{synthetic,0}'::text[]))::integer) -> Hash (cost=8266.80..8266.80 rows=176391 width=40) -> Seq Scan on contacts (cost=0.00..8266.80 rows=176391 width=40) SubPlan 1 -> Result (cost=0.00..0.01 rows=1 width=32) SubPlan 2 -> Result (cost=0.00..0.01 rows=1 width=32)

switz commented 3 years ago

Thanks for sharing! Ultimately the issue comes down to a combination of order_by and limit I believe, along with synthetic query params not hitting their indexes.

I'm making some progress. Subscriptions for some reason default to DESC_NULLS_LAST which skips the pk order by index. For some reason, after I ran the desc_nulls_first query manually, it switched over to that and now uses that even when running with just desc. Maybe there's some magic query planner that saw it being slightly faster.

That said, now it still is very slow much of the time, e.g.:

This results in:

Nested Loop Left Join  (cost=82738.55..82738.59 rows=1 width=48)
  ->  Function Scan on _subs  (cost=0.01..0.01 rows=1 width=48)
  ->  Subquery Scan on matches  (cost=82738.54..82738.56 rows=1 width=32)
        ->  Aggregate  (cost=82738.54..82738.55 rows=1 width=32)
              ->  Index Scan Backward using matches_pkey on matches matches_1  (cost=0.43..78176.66 rows=364950 width=36)
                    Filter: (status = ((_subs.result_vars #>> '{synthetic,0}'::text[]))::integer)
                    SubPlan 1
                      ->  Result  (cost=0.00..0.01 rows=1 width=32)

So it's no longer using a seq scan, but an index scan backward. That said, the status filter is still not hitting the index due to the synthetic query params. I think this is still the result of this PR: https://github.com/hasura/graphql-engine/pull/2942 cc: @lexi-lambda

To compare this with the raw query (instead of the subscription):

Aggregate  (cost=87.60..87.61 rows=1 width=32)
  ->  Sort  (cost=87.04..87.13 rows=37 width=36)
        Sort Key: matches.id DESC
        ->  Index Scan using matches_status_idx on matches  (cost=0.43..86.08 rows=37 width=36)
              Index Cond: (status = 3)
              SubPlan 1
                ->  Result  (cost=0.00..0.01 rows=1 width=32)

As you can see its much faster.

These PRs were relevant to me, but I think this is still facing another problem.

https://github.com/hasura/graphql-engine/issues/657 https://github.com/hasura/graphql-engine/issues/1008

sasog23 commented 3 years ago

subscription not hitting FTS index

query MyQuery { test_search_events(args: {search: "joža"}, order_by: { updated_at: asc, id: asc}, limit: 10) { contact_name } }

Aggregate (cost=1372.11..1372.12 rows=1 width=32) -> Limit (cost=1371.96..1371.98 rows=10 width=48) -> Sort (cost=1371.96..1374.36 rows=961 width=48) Sort Key: events.updated_at, events.id -> Bitmap Heap Scan on events (cost=18.45..1351.19 rows=961 width=48) Recheck Cond: (to_tsvector('mydict'::regconfig, "substring"(((((((((short_desc || ' '::text) || telephone) || ' '::text) || email) || ' '::text) || contact_name) || ' '::text) || ((long_desc -> 'plaint'::text))::text), 1, 500000)) @@ '''joza'':'::tsquery) -> Bitmap Index Scan on fts_events_idx (cost=0.00..18.21 rows=961 width=0) Index Cond: (to_tsvector('mydict'::regconfig, "substring"(((((((((short_desc || ' '::text) || telephone) || ' '::text) || email) || ' '::text) || contact_name) || ' '::text) || ((long_desc -> 'plaint'::text))::text), 1, 500000)) @@ '''joza'':'::tsquery) SubPlan 1 -> Result (cost=0.00..0.01 rows=1 width=32)

subscription MyQuery { test_search_events(args: {search: "joža"}, order_by: { updated_at: asc, id: asc}, limit: 10) { contact_name } }

Nested Loop Left Join (cost=406319.31..406319.35 rows=1 width=48) -> Function Scan on _subs (cost=0.01..0.01 rows=1 width=48) -> Subquery Scan on test_search_events (cost=406319.30..406319.33 rows=1 width=32) -> Aggregate (cost=406319.30..406319.31 rows=1 width=32) -> Limit (cost=406319.15..406319.18 rows=10 width=48) -> Sort (cost=406319.15..406329.51 rows=4144 width=48) Sort Key: events.updated_at, events.id -> Seq Scan on events (cost=0.00..406229.60 rows=4144 width=48) Filter: ((to_tsvector('mydict'::regconfig, "substring"(((((((((short_desc || ' '::text) || telephone) || ' '::text) || email) || ' '::text) || contact_name) || ' '::text) || ((long_desc -> 'plaint'::text))::text), 1, 500000)) @@ to_tsquery('mydict'::regconfig, prepare_fts_search((_subs.result_vars #>> '{synthetic,0}'::text[])))) OR ((_subs.result_vars #>> '{synthetic,0}'::text[]) = ''::text)) SubPlan 1 -> Result (cost=0.00..0.01 rows=1 width=32) JIT: Functions: 16 Options: Inlining false, Optimization false, Expressions true, Deforming true

sasog23 commented 3 years ago

anything new on this issue

lexi-lambda commented 3 years ago

This is a tricky issue. After quite some investigation, I’ve managed to create a much simpler example that highlights what’s going wrong in the Postgres planning process.

The root problem

To start, we’ll create a table with a large number of rows (in this case 1,000,000) where all of those rows are “boring” (in this case that means status = 1):

CREATE TABLE matches
  ( id serial PRIMARY KEY
  , status integer NOT NULL );
CREATE INDEX matches_status_idx ON matches(status);

-- insert 1,000,000 “boring” rows
INSERT INTO matches(status)
  SELECT 1 FROM generate_series(1, 1000000) s(i);

Now let’s add a much smaller number of “interesting” rows to the end of the table:

INSERT INTO matches(status)
  SELECT floor(random() * 5) FROM generate_series(1, 1000) s(i);

After running VACUUM ANALYZE to ensure statistics are up to date, we can generate a query plan for this rather simple query:

EXPLAIN SELECT matches.id
FROM UNNEST('{2,3}'::integer[]) AS params(status)
INNER JOIN matches ON matches.status = params.status;

This fetches all the rows where status is 2 or 3, which only appear in the small block of 1,000 “interesting” rows at the very end of the table. As expected, Postgres will consult the index for this query:

 Nested Loop  (cost=0.43..20419.74 rows=400400 width=4)
   ->  Function Scan on unnest params  (cost=0.00..0.02 rows=2 width=4)
   ->  Index Scan using matches_status_idx on matches  (cost=0.42..8207.86 rows=200200 width=8)
         Index Cond: (status = params.status)

But what happens if we add an innocuous LIMIT 5 to the end of this query? We wouldn’t expect that to affect the query plan significantly, but in fact, it does. Specifically, it switches from an index scan to a full table scan!

EXPLAIN SELECT matches.id
FROM UNNEST('{2,3}'::integer[]) AS params(status)
INNER JOIN matches ON matches.status = params.status
LIMIT 5;

 Limit  (cost=0.05..0.32 rows=5 width=4)
   ->  Hash Join  (cost=0.05..22197.80 rows=400400 width=4)
         Hash Cond: (matches.status = params.status)
         ->  Seq Scan on matches  (cost=0.00..14440.00 rows=1001000 width=8)
         ->  Hash  (cost=0.02..0.02 rows=2 width=4)
               ->  Function Scan on unnest params  (cost=0.00..0.02 rows=2 width=4)

This is a really bad idea, since this means the inner query will have to scan through the 1,000,000 “boring” rows just to find 5 of the “interesting” rows at the end of the table. This strange query planning behavior is what is making these subscription queries so slow.

Why does this happen?

To understand why the Postgres query planner makes this decision, we can inspect the query plan that gets generated if we temporarily disable full-table scans altogether using SET enable_seqscan TO off, forcing the planner to use an index scan:

 Limit  (cost=0.43..0.68 rows=5 width=4)
   ->  Nested Loop  (cost=0.43..20419.74 rows=400400 width=4)
         ->  Function Scan on unnest params  (cost=0.00..0.02 rows=2 width=4)
         ->  Index Scan using matches_status_idx on matches  (cost=0.42..8207.86 rows=200200 width=8)
               Index Cond: (status = params.status)

If you compare the estimated cost of this query to the estimated cost of the version using the full-table scan, you’ll notice that the full-table scan plan appears strictly better: its estimated cost bounds are 0.05..0.32 versus 0.43..0.68. Obviously, this estimate is bogus: in practice, the index scan is enormously faster. So then where does this low cost estimate come from?

The issue has to do with an assumption that a limit simply linearly scales the cost of the limited query. That is, the expectation is that if a LIMIT clause only consumes 1% of the result set, then only 1% of the work will have been done to produce the entire result set. However, this is only true if the filtered rows are uniformly distributed throughout the result set. If they are non-uniformly distributed, as is the case here, it could be dramatically more efficient to perform the filter by consulting the index, since that will have a consistent cost regardless of the distribution of the rows.

Since consulting the index technically has a nonzero constant overhead (since rows from the table itself need to be fetched to return the id column), after the cost is scaled by LIMIT, the full-table scan looks “cheaper.” Again, this is obviously disastrous in this situation, but it’s what Postgres decides to do. Some discussion about why Postgres continues to use this heuristic despite it potentially being so flawed is given in this mailing list thread.

What can be done about it?

Now for the tough question: what can be done to mitigate this issue? In general, there’s no obvious easy solution.

An extreme response would be to revert #2942 altogether and avoid parameterizing over values that don’t correspond to GraphQL query variables of scalar type. But this is pretty unsatisfying, for reasons given in the PR description of #2942: it means two semantically-identical GraphQL queries can have significant differences in performance, in a way that is pretty non-obvious to users.

A better solution would probably be to do something smarter: only parameterize over SQL values if those values came from a GraphQL variable, even if they’re nested inside a larger GraphQL value. That means we’d still parameterize over the album_id in the example given in #2942 (because it came from a GraphQL variable), but we wouldn’t parameterize over the status value in the original comment in this issue (because it’s a hardcoded GraphQL literal).

That approach would still give different operational meaning to semantically-equivalent GraphQL queries, but in a much more intuitive way that better reflects the intent. It makes sense that we’d parameterize over things the user parameterized over, and it also makes sense we’d hardcode values the user hardcoded. We’re unlikely to get value out of parameterizing over hardcoded values because they’re unlikely to ever change, and passing them into the SQL exposes more optimization opportunities to the Postgres query planner.

However, implementing this is a little tricky, because we’d have to somehow track where values came from after parsing the GraphQL query such that we can parameterize over them later. That’s doable, of course, but there are thorny edge cases, and it might be subtle to get right.

switz commented 3 years ago

First off, thanks so much for the detailed look into this. That's a remarkably fascinating breakdown, and I appreciate you going through the effort.

I don't anticipate you reverting 2942, nor do I expect it. The solution to only parametrize gql variables makes sense to me; but doesn't ultimately solve the problem, right? It solves my use-case, but someone else could come along with the same problem arising out of gql variables. So while that's helpful to me, I could see that not being helpful to everyone. I'm in favor of it, if the Hasura team is willing to pursue it, of course.

I wish I could pass a header or something to the query to bypass the parameterization, but since this is a subscription over websockets, I can't even do that. Maybe there's another flag I can pass via the query or POST body though.

All in all, I appreciate you looking into this and hope we can find a solution that works. If there's anything I can do on my end to help, please let me know.

0x777 commented 3 years ago

Maybe we can introduce an option (as part of metadata) to specify subscriptions that shouldn't be batched. @switz Would that work for you?

switz commented 3 years ago

Would the metadata be on a per query (subscription) basis? If so, that would work for me.

fedekau commented 2 years ago

Thanks for the detailed explanation @lexi-lambda! We have just faced the same issue and your explanation was really good. I see no activity on this for more than a year, does anyone know if some progress has been made on this? Or there are workarounds to the problem?

demoh2019 commented 2 years ago

+1

cleivson commented 2 years ago

One thing I realized is that my partial index was not being used due to this "synthetic" array.

My partial index is where unusual = false and index > -1

If I run the query if the columns from the partial index hardcoded, it's very fast:

SELECT
  *
from
  UNNEST(('{27c7bfaf-d4fc-45ff-b04d-e84b72dc438d}') :: uuid [], ('{"{\"cursor\":{\"timestamp\":\"1668615019\"},\"query\":{},\"session\":{},\"synthetic\":[\"false\",\"-1\"]}"}') :: json []) AS "_subs"("result_id", "result_vars")
  LEFT OUTER JOIN LATERAL (
  select * from table
WHERE
      (table."timestamp") > (("_subs"."result_vars" #>>ARRAY['cursor', 'timestamp']))::integer AND table."unusual" = false AND table."swapindex" > -1
ORDER BY "timestamp" ASC  LIMIT 50) x on true

But if I run the query below (simplified version of Hasura generated query) it's very slow

SELECT
  *
from
  UNNEST(('{27c7bfaf-d4fc-45ff-b04d-e84b72dc438d}') :: uuid [], ('{"{\"cursor\":{\"timestamp\":\"1668615019\"},\"query\":{},\"session\":{},\"synthetic\":[\"false\",\"-1\"]}"}') :: json []) AS "_subs"("result_id", "result_vars")
  LEFT OUTER JOIN LATERAL (
  select * from table
WHERE
      (table."timestamp") > (("_subs"."result_vars" #>>ARRAY['cursor', 'timestamp']))::integer AND table."unusual" = ((("_subs"."result_vars"#>>ARRAY['synthetic', '0']))::boolean) AND table."swapindex" > ((("_subs"."result_vars"#>>ARRAY['synthetic', '1']))::integer)
ORDER BY "timestamp" ASC  LIMIT 50) x on true

I'm trying to recreate the index to include the partial columns to see if the planner can use it, but it's not ideal since 90% of my table will have those same values and it'll unnecessarily make my index bigger

hasura / graphql-engine