Open qw4990 opened 3 years ago
0.33
of the row count is from histogram, while the remained 1.00
is from the TopN of the index. TopN includes (1,1)
and (2,2)
, but the bounds of the histogram are (1,1),(4,4),(5,5),(8,8)
, looks like the histogram does not exclude those items in TopN. Also, the NDV of the histogram is 8, which is not as expected.
analyze table t
, analyze table t with 2 topn
and analyze table t with 3 buckets
leads to correct results, while analyze table t with 2 topn, 3 buckets
leads to wrong results.
The histograms are collected individually in tikv and are merged in tidb. After we extract the TopN items from the merged histogram, the bucket intervals can only be narrowed down with careful examination of the discrete values around the bucket boundary. For this case, the first bucket [1,4)
can be narrowed down to [3,4)
since a
is of integer type, and 1
and 2
are extracted to the TopN, we can do this optimization but it is pretty ad-hoc and cannot be applied to other data types.
Development Task
The
estRows
should be 1.00 instead of 1.33;