Open kmuthukk opened 3 years ago
Setting enable_hashagg
to false
is one potential workaround for now to force index-scan in this situation.
postgres=# set enable_hashagg=true;
SET
postgres=# explain select v, count(*) from tab group by v limit 190;
QUERY PLAN
-------------------------------------------------------------------------
Limit (cost=105.00..106.90 rows=190 width=40)
-> HashAggregate (cost=105.00..107.00 rows=200 width=40)
Group Key: v
-> Foreign Scan on tab (cost=0.00..100.00 rows=1000 width=32)
(4 rows)
postgres=# set enable_hashagg=false;
SET
postgres=# explain select v, count(*) from tab group by v limit 190;
QUERY PLAN
--------------------------------------------------------------------------------------------
Limit (cost=0.00..114.95 rows=190 width=40)
-> GroupAggregate (cost=0.00..121.00 rows=200 width=40)
Group Key: v
-> Index Only Scan using tab_v_idx on tab (cost=0.00..114.00 rows=1000 width=32)
(4 rows)
set enable_hashagg=false; @kmuthukk it's working now , can we set this parameter globally at YugabyteDB Cluster level.
@coder2k16 Yes, you can use the tserver gflag ysql_pg_conf
(when starting the tserver/cluster) and set that to enable_hashagg=false
. Then that will be the default for all sessions.
Note: Example usage of that flag (for configuring auth in that instance): https://docs.yugabyte.com/latest/secure/authentication/password-authentication/#enable-scram-sha-256-authentication
Thanks it worked for GROUP BY and DISTINCT Queries for a single table. #5866 But it is still not working for the queries having JOIN and having group by both. For this case we still observer same issue, that the query is using index only for certain number of records.
performance_test=# set enable_hashagg=false; SET Time: 1.613 ms performance_test=# explain select x.name , count(*) from test.user x, test.user y where x.usersid_pk=y.usersid_pk group by x.name;
QUERY PLAN
GroupAggregate (cost=697.19..736.69 rows=200 width=66) Group Key: x.name -> Sort (cost=697.19..709.69 rows=5000 width=58) Sort Key: x.name -> Hash Join (cost=112.50..390.00 rows=5000 width=58) Hash Cond: (x.usersid_pk = y.usersid_pk) -> Foreign Scan on "user" x (cost=0.00..100.00 rows=1000 width=62) -> Hash (cost=100.00..100.00 rows=1000 width=4) -> Foreign Scan on "user" y (cost=0.00..100.00 rows=1000 width=4) (9 rows) Time: 12.943 ms performance_test=#
performance_test=# explain select x.name , count(*) from test.user x, test.user y where x.usersid_pk=y.usersid_pk group by x.name limit 100;
QUERY PLAN
Limit (cost=0.00..677.50 rows=100 width=66) -> GroupAggregate (cost=0.00..1355.00 rows=200 width=66) Group Key: x.name -> Nested Loop (cost=0.00..1328.00 rows=5000 width=58) -> Index Scan using idx_name on "user" x (cost=0.00..124.00 rows=1000 width=62) -> Index Only Scan using idx_usersid_pk on "user" y (cost=0.00..1.10 rows=10 width=4) Index Cond: (usersid_pk = x.usersid_pk) (7 rows) Time: 20.519 ms performance_test=#
Please help us to understand what does this variable do and how does it work? Why do we need to set it OFF explicitly in ysql to enable index usage. Do every time we need to set it OFF while executing such queries? Can this issue will get fixed in future releases of yugabyte?
Jira Link: DB-1328 Issue reported by user on www.yugabyte.com/slack.
When GROUP BY LIMIT is used:
Test Case:
https://gist.github.com/kmuthukk/3d6b325b80c5f5474d7d391f365d51bb
Output: