Open shantanugupta-yb opened 1 year ago
@shantanugupta-yb Did you enable the guc variable 'yb_enable_optimizer_statistics' before trying this?
@tanujnay112, yes I did enable yb_enable_optimizer_statistics variable. Here are the observations for same https://yugabyte.slack.com/archives/C03RB7JHKM0/p1678519579065579?thread_ts=1678442689.915799&cid=C03RB7JHKM0
Hash Join (cost=1254.00..105104.00 rows=10000 width=16) (actual time=192.835..1611.669 rows=100000 loops=1)
Merge Join (cost=41.85..113783.00 rows=10000 width=16) (actual time=3.806..198.793 rows=100000 loops=1)
The access path to rangetbl_1
is exactly the same and placed on the inner side in both plans. The Materialize node in the Merge Join plan is more expensive than the Hash node but only slightly (+25).
-> Index Only Scan using rangetbl_1_col_bigint_id_2_idx on public.rangetbl_1 (cost=0.00..1129.00 rows=10000 width=8) (actual time=1.991..172.984 rows=100000 loops=1)
rangetbl_2
.Hash:
-> Seq Scan on public.rangetbl_2 (cost=0.00..100000.00 rows=1000000 width=8) (actual time=1.570..1216.975 rows=1000000 loops=1)
Merge:
-> Index Only Scan using rangetbl_2_col_bigint_id_2_idx on public.rangetbl_2 (cost=0.00..110004.00 rows=1000000 width=8) (actual time=1.826..25.152 rows=100001 loops=1)
The Seq Scan cost (Hash plan) is estimated cheaper than the Index Only Scan (Merge plan) by 10004, however, the Seq Scan is much slower than the Index Only Scan. 1216.975 vs. 25.152. There's no predicate pushed down to DocDB either as a remote filter or the index access key. This costing vs. execution time discrepancy seems to be the primary source of the problem.
On the execution side, we don't have to fetch (unless packed-column table) and send over unused columns in case of this Seq Scan. We should investigate If we are actually doing so as indicated in the plan, too. Update: Tried a similar Hash join plan with Seq Scan on the outer side and checked the DocDB request. We are not sending all the columns over the network. We are computing actually necessary columns in yb_scan.c instead of blindly looking at the target list.
Seq Scan output (Hash):
Output: rangetbl_2.col_bigint_id_1, rangetbl_2.col_bigint_id_2, rangetbl_2.col_bigint_id_3, rangetbl_2.col_bigint_id_4, rangetbl_2.col_bigint_1, rangetbl_2.col_bigint_2, rangetbl_2.col_float2_1, rangetbl_2.col_float2_2, rangetbl_2.col_float5_1, rangetbl_2.col_float5_2, rangetbl_2.col_boolean_1, rangetbl_2.col_varchar10_id_1, rangetbl_2.col_varchar100_id_1, rangetbl_2.col_varchar100_id_2, rangetbl_2.col_varchar500_id_1
Index Only Scan output (Merge):
Output: rangetbl_2.col_bigint_id_2
The difference in the execution time of the join node by itself seems reasonable: Hash: 1611.669-(1216.975+191.142) = 203.552 Merge: 198.793-(25.152+146.444) = 27.197
Jira Link: DB-5797
Description
In case of join condition on range index with one filter condition of range partitioned table again nested loop was selected and query executed in 17030ms . Post execution of Analyse on both the tables "Hash join" was selected where latency reduced to 1627ms but with "Merge join" the query got executed in 205ms.
So even after running Analyse on both tables which updates the table statistics but still the cost estimation seems to be incorrect.
Before running analyse nest loop is selected
Post running analyse on both the tables involved in join, Hash join is selected by default
But the most optimised query plan w.r.t to latency is running the query with merge join
Table schema
Warning: Please confirm that this issue does not contain any sensitive information