pingcap / parser

A MySQL Compatible SQL Parser
Apache License 2.0
1.41k stars 489 forks source link

parser: support ANALYZE TABLE t PREDICATE COLUMNS / COLUMN c1 [, c2] ... and SHOW COLUMN_STATS_USAGE #1329

Closed xuyifangreeneyes closed 3 years ago

xuyifangreeneyes commented 3 years ago

What problem does this PR solve?

Related Issue: https://github.com/pingcap/tidb/issues/27828

Support syntax like ANALYZE TABLE t PREDICATE COLUMNS and ANALYZE TABLE t COLUMNS c1 [, c2] .... Related tidb pr: https://github.com/pingcap/tidb/pull/27802

What is changed and how it works?

Support syntax like ANALYZE TABLE t PREDICATE COLUMNS and ANALYZE TABLE t COLUMNS c1 [, c2] ... and SHOW COLUMN_STATS_USAGE (WHERE/LIKE ...).

Check List

Tests

Code changes

Side effects

Related changes

ti-chi-bot commented 3 years ago

[REVIEW NOTIFICATION]

This pull request has been approved by:

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment. After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.
CLAassistant commented 3 years ago

CLA assistant check
All committers have signed the CLA.

xuyifangreeneyes commented 3 years ago

/cc @winoros

kennytm commented 3 years ago

What are predicate columns? https://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE.html?

xuyifangreeneyes commented 3 years ago

https://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE.html

Yes. Here is the related design doc: https://docs.google.com/document/d/12_SiDLInsQyHj915U4Np4uphpH3sFhQk0KzMKRAmyi4/edit?usp=sharing

kennytm commented 3 years ago
  1. what is the difference between ANALYZE TABLE t COLUMNS a, b, c and ANALYZE TABLE t UPDATE HISTOGRAM ON a, b, c?
  2. how about the MariaDB syntax ANALYZE TABLE t PERSISTENT FOR COLUMNS (a, b, c);?
xuyifangreeneyes commented 3 years ago
  1. what is the difference between ANALYZE TABLE t COLUMNS a, b, c and ANALYZE TABLE t UPDATE HISTOGRAM ON a, b, c?
  2. how about the MariaDB syntax ANALYZE TABLE t PERSISTENT FOR COLUMNS (a, b, c);?
  1. ANALYZE TABLE t UPDATE HISTOGRAM is compatible with MySQL. Currently we only implement it in parser but don't implement it in tidb (see https://github.com/pingcap/tidb/issues/27666). The behavior of ANALYZE TABLE t UPDATE HISTOGRAM may need discussion (do we only collect Histogram? or we also collect other statistics such as CMSketch, TopN, FMSketch?), which is due to different implementations of ANALYZE and statistics between TiDB and MySQL. As for ANALYZE TABLE t COLUMN a, b, c, we collect all the statistics including Histogram, TopN and so on.
  2. The syntax of ANALYZE TABLE t COLUMN a,b,c refers to TiDB's ANALYZE TABLE t INDEX idx1, idx2 and ANALYZE TABLE t PARTITION p1, p2. The MariaDB syntax also looks fine to me while maybe PERSISTENT seems redundant since all the statistics is persistent in TiDB.
xuyifangreeneyes commented 3 years ago

/cc @chrysan @qw4990

ti-chi-bot commented 3 years ago

@xuyifangreeneyes: PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
xuyifangreeneyes commented 3 years ago

/cc @xiongjiwei @morgo

xuyifangreeneyes commented 3 years ago

Since parser has been moved into tidb repo, I close it.