Closed FranckPachot closed 1 year ago
Because of lack of strict inequality match in docDB (#10738 - @tanujnay112 has a fix but has not landed yet), some (actually many, with this test case) got rejected on the Postgres side.
Those extra filtering happening within the scan node such as IndexScan, etc. are sometimes reported as "Rows Removed by Index Recheck" in the EXPLAIN ANALYZE output, however, that is not always the case.
indextuple_matches_key
(the same thing in heaptuple_matches_key
, too):
if (is_null)
return false;
bool matches = DatumGetBool(FunctionCall2Coll(&key[i].sk_func,
key[i].sk_collation,
res_datum,
key[i].sk_argument));
if (!matches)
return false;
}
return true;
}
cc: @m-iancu @sushantrmishra
A modified version of the query with "=" stops after receiving the first batch.
e.g.:
explain (costs off, analyze) select * from demo where A>=50 and B>=5000 limit 10;
I've also tried the original query with @tanujnay112's fix #10738, and it no longer issues extra fetch calls.
Tested with 2.17.2
yugabyte=# explain (analyze, dist) select * from demo where A>50 and B>5000 limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.00 rows=1 width=8) (actual time=5166.045..5166.052 rows=10 loops=1)
-> Seq Scan on demo (cost=0.00..0.00 rows=1 width=8) (actual time=5166.042..5166.047 rows=10 loops=1)
Remote Filter: ((a > 50) AND (b > 5000))
Storage Table Read Requests: 1
Storage Table Execution Time: 5166.040 ms
Planning Time: 0.083 ms
Execution Time: 5166.119 ms
Storage Read Requests: 1
Storage Write Requests: 0
Storage Execution Time: 5166.040 ms
Peak Memory Usage: 14 kB
(11 rows)
Time: 5166.911 ms (00:05.167)
Yes, fixed
Jira Link: DB-672
Description
The full case is from this blog post on Index Skip Scan: https://dev.to/franckpachot/index-skip-scan-in-yugabytedb-2ao2
I encountered a case where too many rows are read, by small pages,with query using LIMIT
Test case:
The plan looks good, except that the execution time is high (3 seconds) to get 10 rows by Index Scan on the primary key:
The DocDB statistics show many reads
The
yb_debug_log_docdb_requests=true
show pagination, reading 9500 pages of 10 rows from(A=50, B=5010)
to(A=50, B=100000)
:Note:
"H\200\000\0002H\200\000\023\222!#\200\"
is(A=50, B=5010)
"H\200\000\0002H\200\001\206\240!#\200"
is(A=50, B=100000)
:The first page should have been sufficient for the query. But it seems that all the 95000 rows for
A=50
have been read, by pages of 10.Workaround:
This is faster because it reads by larger page, and filters afterwards.