shlomi-noach commented 4 years ago

This issue will be the tracking space for all things vitess Online DDL. Note that this issue is created after some substantial work is done:

code: https://github.com/vitessio/vitess/pull/6547, available in release-8.0
documentation: https://vitess.io/docs/user-guides/schema-changes/, https://github.com/vitessio/website/pull/510

6547 served as a long running tracking point; pasting some of #6547 content here for background, purpose and intentions.

TL;DR

Automate away all the complexity of schema migrations. Users issue:

alter with 'gh-ost' table example modify id bigint not null;

alter with 'pt-osc' table example modify id bigint not null

or

$ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global \
    ApplySchema -sql "alter with 'gh-ost' table example modify id bigint unsigned not null" commerce

$ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global \
    ApplySchema -sql "alter with 'pt-osc' table example modify id bigint unsigned not null" commerce

(syntax subject to change, see #6782 )

and vitess will schedule an online schema change operation to run on all relevant shards, then proceed to apply the change via gh-ost on all shards.

The ALTER TABLE problem

First, to iterate the problem: schema changes have always been a problem with MySQL; a straight ALTER is a blocking operation; a ONLINE ALTER is only "online" on the master/primary, but is effectively blocking on replicas. Online schema change tools like pt-online-schema-change and gh-ost overcome these limitations by emulating an ALTER on a "ghost" table, which is populated from the original table, then swapped in its space.

Traditionally, online schema changes are considered to be "risky". Trigger based migrations add significant load onto the master server, and their cut-over phase is known to be a dangerous point. gh-ost was created at GitHub to address these concerns, and successfully eliminated concerns for operational risks: with gh-ost the load on the master is low, and well controlled, and the cut-over phase is known to cause no locking issues. gh-ost comes with different risks: it applies data changes programmatically, thus the issue of data integrity is of utmost importance. Another note of concern is data traffic: going out from MySQL into gh-ost and back into MySQL (as opposed to all-in MySQL in pt-online-schema-change).

This way or the other, running an online schema change is typically a manual operation. A human being will schedule the migration, kick it running, monitor it, possibly cut-over. In a sharded environment, a developer's request to ALTER TABLE explodes to n different migrations, each needs to be scheduled, kicked, monitored & tracked.

Sharded environments are obviously common for vitess users and so these users feel the pain more than others.

Schema migration cycle & steps

Schema management is a process that begins with the user designing a schema change, and ends with the schema being applied in production. This is a breakdown of schema management steps as I know them:

Design code
Publish changes (pull request)
Review
Formalize migration command (the specific ALTER TABLE or pt-online-schema-change or gh-ost command)
Locate: where in production should this migration run?
Schedule
Execute
Audit/monitor
Cut-over/complete
Cleanup
Notify user
Deploy & merge

What we propose to address

Vitess's architecture uniquely positions it to be able to automate away much of the process. Specifically:

Formalize migration command: turning an ALTER TABLE statement into a gh-ost/pt-osc invocation is super useful if done by vitess, since vitess can not only validate schema/params, but also can provide credentials, apply throttling logic, can instruct gh-ost on how to communicate progress via hooks, etc.
Locate: given schema/table, vitess just knows where the table is located. It knows if the schema is sharded. It knows who the shards are, who the shards masters are. It knows where to run gh-ost. Last, vitess can tell us which replicas we can use for throttling.
Schedule: vitess is again in a unique position to schedule migrations. The fact someone asks for a migration to run does not mean the migration should start right away. For example, a shard may already be running an earlier migration. Running two migrations at a time is less than ideal, and it's best to wait out the first migration before beginning the second. A scheduling mechanism is both useful to running the migrations in optimal order/sequence, as well as providing feedback to the user ("your migration is on hold because this and that", or "your migration is 2nd in queue to run")
Execute: vttablet is the ideal entity to run a migration; can read instructions from topo server and can write progress to topo server. vitess is aware of possible master failovers and can request a re-execute is a migration is so interrupted mid process.
Audit/monitor: vtctld API can offer endpoints to track status of a migration (e.g. "in progress on -80, in queue on 80-"). It may offer progress pct and ETA.
cut-over/complete: in my experience with gh-ost, the cut-over phase is safe to automate away. If running a migration during a resharding operation, then we may need to coordinate cut-over between upstream and downstream migrations.
cleanup: the old table needs to be dropped; vttablet runs a table lifecycle service (aka garbage collector) to clean up those tables.

As mentioned, substantial initial work done in #6547.
Discussion about syntax in #6782
Bugfixes in #6910
Initial followups to #6547 are in #6901, notably auto-retry migration after master failover

shlomi-noach commented 4 years ago

Logic for auto-retrying a migration: https://github.com/vitessio/vitess/pull/6901

If migration is identified to have started in a different tablet, and
has failed - found to be stale
and hasn't been retried yet (temporary restriction)
then, it is automatically retried. The main use case is a failover scenario: if primary fails, tablet and gh-ost alike, then the newly promoted tablet will kick the migration back to life (starting a new migration).

Docs in https://github.com/vitessio/website/pull/571

shlomi-noach commented 3 years ago

https://github.com/vitessio/vitess/pull/7083 supports CREATE and DROP statements in ApplySchema to run as online DDL.

[x] need to add a column ddl_type (CREATE, ALTER, DROP) and which should show up on vtctl OnlineDDL show
[x] need context (see #7082)

shlomi-noach commented 3 years ago

Once https://github.com/vitessio/vitess/pull/7083 and https://github.com/vitessio/vitess/pull/7097 are merged:

when ddl_strategy is an online strategy, analyze a DROP TABLE statement to:

[x] Explode into single-table statements (a single DROP TABLE statement can indicate multiple tables)
[x] Turn each one into a RENAME TABLE statement, renaming into a HOLD gc state with 48 hour ETA
[x] vttablet should transition away from HOLD even if ETA is unmet, if table_gc_lifecycle does not include HOLD

shlomi-noach commented 3 years ago

[x] support vtctl OnlineDDL <keyspace> show <context>, to fetch status of migrations wit ha specific context, as added in #7082 Edit: see https://github.com/vitessio/vitess/pull/7145 for suggested solution

shlomi-noach commented 3 years ago

[x] support CREATE INDEX statements as online DDL, and convert them to ALTER TABLE syntax -- required by gh-ost.

shlomi-noach commented 3 years ago

Online DDL via VReplication

At this time I have clarity as for how this will look like.

On one hand, we will go full native and reuse Vitess existing mechanisms. On the other hand, we break apart from the existing flow in multiple ways.

POC is in #7419 . It is just the beginning of what online DDL via vreplication will look like -- but already has the initial implementation running.

Some design bulletpoints:

We will use vreplication to replicate from a primary tablet onto itself (same keyspace, same shard, same tablet, both as source and target)
Our existing vreplication flows are owned by the user. User e.g. runs a MoveTables, user then runs SwitchReads, then runs SwitchWrites -- based on how the user perceives the process is going on. However, we want to have schema migrations fully automated. Not only do we want to own the starting of the migration (remember our tablet schedule migrations sequentially), but we also want to reliably automatically cut-over the migration.
To that effect, tabletserver will be the owner of schema migrations/vreplication. As with gh-ost and pt-osc, a shard's primary table has the independece to schedule the next migration, run that migration, potentially cancel or retry it, and follow it to completion. This means no vtctl, no wrangler
Another important difference from normal VReplication flows is that in all other flows, there's source table(s) and target table(s), and they are distinct. In MoveTables we can go as far as set routing rules for new queries to route to the target tables. In a schema migration, we want to replace the original table (via RENAME TABLE). We will do this underneath the feet of VReplication.
As opposed to gh-ost or pt-osc, vreplication migrations can natively survive a failover (in fact, this will be one of the more important advantages of migrations/vreplication); this calls for some redesign in onlineddl.Executor
we import some initial-setup analysis and preparation code from gh-ost. Some of it is redundant, and will be cleaned up. Some of it overlaps with Vitess functionality (parsing) and will be replaces. But for now we know it's stable and working.

Starting a vreplication schema migration

The flow for starting a vreplication based schema migration from tabletserver is:

ensure there's no vreplication on same workflow
create an empty table LIKE the original table
apply the ALTER TABLE statement onto the empty table
analyze both tables, validate some basic constraints ar met (this is largely imported from gh-ost)
evaluate the vreplication source filter query that only selects relevant columns, and takes care of column renames and of generated columns.
reload schema (because a prev migration might have introduced e.g. a new column)
create a new vreplicatoin entry
start vreplication

tracking a running migration

onlineddl.Executor to keep track of vreplication stream liveness by looking at _vt.vreplication entry.
identify stale migrations and remove them
determine that a migration is ready: copy phase is complete, pos is non empty, time updated and transaction time are both up-to-date (small or no lag)

cutting over a vreplication migration

The flow is:

We don't use wrangler. Create our own tablet manager client.
Get hold of Tablet
Get hold of ShardInfo
Get hold of topology server (already exists in onlineddl.Executor)
read vreplication information
lock keyspace
stop writes on source
read up-to-date stream (specifically, pos)
wait for pos
stop vreplication. it is now up to date with original table
swap source and target tables from beneath vreplication
all good
(in the future, this flow will actually continue; more writeup in the future)

More to come.

shlomi-noach commented 3 years ago

It should be possible to ReloadTable, like ReloadSchema but for a single table. This is desired for online DDL. Reloading an entire schema takes time and we cannot expect to accomplish it within the timeframe of a cut-over. But reloading a single table should be just fine.

[ ] support ReloadTable

shlomi-noach commented 3 years ago

Revert for Online DDL is now available (per PR review) via https://github.com/vitessio/vitess/pull/7478

shlomi-noach commented 3 years ago

Declarative schema changes via #7725

shlomi-noach commented 3 years ago

https://github.com/vitessio/vitess/pull/7785 begins the work to move away from topo in online DDL. Meaning DDLs, coming from either VTGate or vtctl, are sent directly to the relevant shards (primary tablets on those shards).

This has to be backwards compatible. The timeline is as follows:

[x] v10 RC is already out and #7785 is unmerged yet
[x] v11 will have the new code, and default to use of topo
- deprecation notice on vtctl OnlineDDL revert (replaced by ApplySchema -sql 'revert vitess_migration ...)
- deprecation notice on vtctl VExec
[x] v12 will default to bypass topo, and in fact not allow sending requests to topo
[ ] v13 removes all topo related and VExec related code

shlomi-noach commented 2 years ago

Roadmap to replacing online strategy with vitess (better and unambiguous name):

[ ] https://github.com/vitessio/vitess/pull/9642 supports vitess strategy. This PR is created in v13 RC
[ ] In v14 we will implicitly transform a online submission into vitess
[ ] in v15 we will remove support for online in the backend (the text online will just not exist there anymore)
[ ] in v16 we can, if we choose to, error on online submission (though we can also keep it supported forever)

shlomi-noach commented 2 years ago

https://github.com/vitessio/vitess/pull/9755 offers query buffering for vitess migration cut-over phase. Still a known race condition, to be handled by a different PR.

shlomi-noach commented 2 years ago

We consider Online DDL to be production ready and generally available, and remove the "Experimental" notice. See https://github.com/vitessio/vitess/pull/10310 and https://github.com/vitessio/website/pull/1020

shlomi-noach commented 2 years ago

Design for fast REVERT of fast RANGE PARTITION operations: https://github.com/vitessio/vitess/issues/10317

vitessio / vitess

Online DDL: tracking issue #6926

6547 served as a long running tracking point; pasting some of #6547 content here for background, purpose and intentions.

TL;DR

The ALTER TABLE problem

Schema migration cycle & steps

What we propose to address

Online DDL via VReplication

Starting a vreplication schema migration

tracking a running migration

cutting over a vreplication migration