Open sushantrmishra opened 1 year ago
Similar is observed for HashAggregate as well:
taqo_basic=# explain (analyze, dist) SELECT t1.k1,
taqo_basic-# t1.k2,
taqo_basic-# t1.v1,
taqo_basic-# t1.v2
taqo_basic-# FROM t1
taqo_basic-# GROUP BY t1.k1, t1.k2, t1.v1, t1.v2 limit 100000
taqo_basic-# ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
------
Limit (cost=105.00..107.00 rows=200 width=72) (actual time=995.416..1047.885 rows=100000 loops=1)
-> HashAggregate (cost=105.00..107.00 rows=200 width=72) (actual time=995.414..1018.708 rows=100000 loops=
1)
Group Key: k1, k2
-> Seq Scan on t1 (cost=0.00..100.00 rows=1000 width=72) (actual time=6.957..706.215 rows=500000 loo
ps=1)
Storage Table Read Requests: 489
Storage Table Read Execution Time: 601.210 ms
Planning Time: 3.678 ms
Execution Time: 1071.829 ms
Storage Read Requests: 489
Storage Read Execution Time: 601.210 ms
Storage Write Requests: 0.000
Catalog Read Requests: 23
Catalog Read Execution Time: 7.653 ms
Catalog Write Requests: 0.000
Storage Flush Requests: 0
Storage Execution Time: 608.862 ms
Peak Memory Usage: 82546 kB
(17 rows)
taqo_basic=#
Execute the same query against RDSPG:
taqo_basic=> explain analyze SELECT t1.k1,
t1.k2,
t1.v1,
t1.v2
FROM t1
GROUP BY t1.k1, t1.k2, t1.v1, t1.v2 limit 100000
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
---------
Limit (cost=11667.00..12667.00 rows=100000 width=34) (actual time=325.321..354.242 rows=100000 loops=1)
-> HashAggregate (cost=11667.00..16667.00 rows=500000 width=34) (actual time=325.320..347.195 rows=100000
loops=1)
Group Key: k1, k2
Batches: 1 Memory Usage: 65553kB
-> Seq Scan on t1 (cost=0.00..9167.00 rows=500000 width=34) (actual time=0.010..117.864 rows=500000
loops=1)
Planning Time: 0.095 ms
Execution Time: 374.058 ms
(7 rows)
Jira Link: DB-8147
Description
TLDR: Sort operation takes ~90ms vs ~200ms in YugabyteDB.
TEST SETUP:
Client is in
us-west-2a
andwork_mem
is set to 50MB.RDS PostgreSQL Instance: db.m5.2xlarge (us-west-2a), PostgreSQL version 15.3, GP3
Default PostgreSQL execution: It used the index scan and incremental sort, hence much faster. Though not relevant to issue, just putting here for completeness.
Turn off the incremental sorting and parallelism (to get the YB equivalent plan):
YB Test: Configuration: -rf 3 cluster, leaders in us-west-2a, colocated database, c5.2xlarge , 2.19.3-b122
With DIST option:
Warning: Please confirm that this issue does not contain any sensitive information