vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.47k stars 2.09k forks source link

Add VEXPLAIN KEYS for improved sharding key selection #16830

Closed systay closed 3 days ago

systay commented 4 days ago

Description:

This PR introduces a new VEXPLAIN type called KEYS, designed to assist developers and database administrators in making informed decisions about sharding keys. The VEXPLAIN KEYS command provides a concise summary of the query structure, highlighting columns used in joins, filters, and grouping operations.

Key features:

  1. Identifies potential sharding key candidates
  2. Highlights SARGable columns in the WHERE clause
  3. Shows grouping columns for aggregation queries
  4. Reveals join conditions between tables

The output focuses on the structural elements of the query that are most relevant to sharding decisions, without executing the query or providing execution statistics.

Example usage:

mysql> vexplain keys select u.foo, ue.bar, count(*) from user u join user_extra ue on u.id = ue.user_id where u.name = 'John Doe' group by 1, 2;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ColumnUsage                                                                                                                                                                                |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
        "groupingColumns": [
                "user.foo",
                "user_extra.bar"
        ],
        "joinColumns": [
                "user.id",
                "user_extra.user_id"
        ],
        "filterColumns": [
                "user.name"
        ],
        "statementType": "SELECT"
} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

This output provides valuable insights for sharding decisions:

By analyzing this information across multiple queries, developers can make more informed decisions about sharding strategies, potentially improving query performance and data distribution.

Related Issue(s)

https://github.com/vitessio/vitess/pull/16768 Docs: https://github.com/vitessio/website/pull/1853

Checklist

vitess-bot[bot] commented 4 days ago

Review Checklist

Hello reviewers! :wave: Please follow this checklist when reviewing this Pull Request.

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

codecov[bot] commented 4 days ago

Codecov Report

Attention: Patch coverage is 94.02985% with 4 lines in your changes missing coverage. Please review.

Project coverage is 69.45%. Comparing base (969e018) to head (ac6138e).

Files with missing lines Patch % Lines
go/vt/vtgate/planbuilder/operators/keys.go 93.75% 3 Missing :warning:
go/vt/vtgate/planbuilder/vexplain.go 94.11% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #16830 +/- ## ========================================== + Coverage 69.41% 69.45% +0.03% ========================================== Files 1570 1571 +1 Lines 202945 203004 +59 ========================================== + Hits 140880 140993 +113 + Misses 62065 62011 -54 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

deepthi commented 3 days ago

TIL SARGable https://dba.stackexchange.com/questions/162263/what-does-the-word-sargable-really-mean