Open winoros opened 3 years ago
It's under testing, and will give report soon. Once it satisfies our needs, a proposal will be given.
BTW, nearly all commercial RDBMS choose statistics based on sampling as the default ones.
We've got a rather good result on the sampling tech to estimate the row count of index.
The one whose error rate is closer to 0 is the better one. Compared with 4.0.8 you can see that it's a huge improvement.
Development Task
Currently, we are constructing the index(i.e. multi-column for most cases)'s statistics based on full data. This will make the analyze run a very long time after the command is triggered. And this will cause issue like https://github.com/pingcap/tidb/issues/23453.
So we're testing to construct the index's statistics based on sampling to see whether it will lose precision on row count estimation.
If the ones based on sampling are okay, we're able to construct the index's statistics by sampling on tidb data. Thus we don't need to scan the indexes' KV any more.