vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.23k stars 2.07k forks source link

Feature Request: improved Online DDL row estimation #13437

Closed shlomi-noach closed 11 months ago

shlomi-noach commented 1 year ago

Feature Description

In Online DDL ALTER TABLE migrations, Vitess estimates the number of table rows via SHOW TABLE STATUS. This number can be very skewed. As the migration runs, Vitess indicates its progress in _vt.schema_migrations as a [0..100] value. This column is also exposed in SHOW VITESS_MIGRATIONS output.

The problem is that table_rows can be so skewed that:

This happens because the estimated number of rows can be very much higher or very much lower than the actual number of rows in the table. The user experience is poor as the user is never sure what the actual status is.

Potential approaches:

  1. Run ANALYZE TABLE before/during the migration to get better row estimate. This is an easy solution. The row estimate improves, but still remains imprecise (typically pushing down the error margin to 5%)
  2. Run SELECT COUNT(*) before the migration, and update count with ongoing processed events from the binary log. Problem is that this query may actually run for hours, sometimes even longer than the migration itself. Also, it had better run on a replica.

Use Case(s)

Any long running Online DDL (schema change/migration)

shlomi-noach commented 1 year ago

Addressed by https://github.com/vitessio/vitess/pull/13352

shlomi-noach commented 11 months ago

I'm closing this issue as https://github.com/vitessio/vitess/pull/13352 offers a reasonable solution.