VStream server-side error during gh-ost online schema migration

keweishang commented 3 years ago

Overview of the Issue

Our Debezium Vitess Connector (CDC) uses VStream gRPC to stream change events from a sharded (2 shards: -80 and 80-) keyspace called test_sharded_keyspace.

When running the following gh-ost online schema migration:

vtctlclient -server vtctld-host:15999 ApplySchema -sql "ALTER WITH 'gh-ost' TABLE bar_entry add column status int" test_sharded_keyspace

VStream gRPC throws a server-side error:

io.grpc.StatusRuntimeException: UNKNOWN: target: test_sharded_keyspace.80-.replica, 
used tablet: zoneA-301 (prelive-ib-tablet-301.vt): vttablet: 
rpc error: code = Unknown desc = stream (at source tablet) 
error @ fa7c9236-2c16-11eb-8077-024d038b20ae:1,fac1535e-2c16-11eb-88b0-063538010254:1-1131111: 
cannot determine table columns for bar_entry: 
event has [8 254 17 17 8 8 8 15 246 254 246 1 2 246 3 3], 
schema as [
    name:"id" type:UINT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"id" column_length:20 charset:63 flags:49699
... (14 other columns, in total 15 columns, which is 1 column less than in the event above)
]

Reproduction Steps

Steps to reproduce this issue:

Deploy the following vschema:

{
    "sharded": true,
    "vindexes": {
        "hash": {
            "type": "hash"
        }
    },
    "tables": {
        "bar_entry": {
            "columnVindexes": [
                {
                    "column": "c",
                    "name": "hash"
                }
            ]
        }
    }
}

Deploy the following schema:

CREATE TABLE `bar_entry` (
    `id` bigint unsigned NOT NULL AUTO_INCREMENT,
    `a` enum('fooEntry') NOT NULL DEFAULT 'fooEntry',
    `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `b` bigint NOT NULL,
    `c` bigint unsigned NOT NULL,
    `d` bigint NOT NULL,
    `e` varchar(255) NOT NULL,
    `f` decimal(14, 4) NOT NULL,
    `g` char(3) NOT NULL,
    `h` decimal(14, 4) NOT NULL DEFAULT '1.0000',
    `i` tinyint DEFAULT '1',
    `j` smallint NOT NULL DEFAULT '0',
    `k` decimal(14, 4) DEFAULT NULL,
    `l` int unsigned DEFAULT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `idx_foo_unique` (
        `b`,
        `d`,
        `e`,
        `j`,
        `i`
    ),
    KEY `idx_foo` (`d`, `e`, `l`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;

Run VStream gRPC client to continuously stream from the sharded keyspace test_sharded_keyspace where the table resides in.
The table has 30 million rows.
Run vtctlclient -server vtctld-host:15999 ApplySchema -sql "ALTER WITH 'gh-ost' TABLE bar_entry add column status int" test_sharded_keyspace to start gh-ost online schema migration.
Run vtctlclient -server vtctld-host:15999 OnlineDDL test_sharded_keyspace show recent to check gh-ost job status, which changes from queued to running to complete on each shards (-80 and 80-).
Run show create table bar_entry\G and see the new column status is present.

VStream gRPC client received the following server-side error:

io.grpc.StatusRuntimeException: UNKNOWN: target: test_sharded_keyspace.80-.replica, 
used tablet: zoneA-301 (prelive-ib-tablet-301.vt): vttablet: 
rpc error: code = Unknown desc = stream (at source tablet) 
error @ fa7c9236-2c16-11eb-8077-024d038b20ae:1,fac1535e-2c16-11eb-88b0-063538010254:1-1131111: 
cannot determine table columns for bar_entry: 
event has [8 254 17 17 8 8 8 15 246 254 246 1 2 246 3 3], 
schema as [
name:"id" type:UINT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"id" column_length:20 charset:63 flags:49699
name:"a" type:ENUM table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"a" column_length:36 charset:33 flags:257
name:"created" type:TIMESTAMP table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"created" column_length:19 charset:63 flags:1153
name:"last_update" type:TIMESTAMP table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"last_update" column_length:19 charset:63 flags:9345
name:"b" type:INT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"b" column_length:20 charset:63 flags:53257
name:"c" type:UINT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"c" column_length:20 charset:63 flags:36897
name:"d" type:INT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"d" column_length:20 charset:63 flags:53257
name:"e" type:VARCHAR table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"e" column_length:765 charset:33 flags:20481
name:"f" type:DECIMAL table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"f" column_length:16 charset:63 decimals:4 flags:36865
name:"g" type:CHAR table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"g" column_length:9 charset:33 flags:4097
name:"h" type:DECIMAL table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"h" column_length:16 charset:63 decimals:4 flags:32769
name:"i" type:INT8 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"i" column_length:4 charset:63 flags:49152
name:"j" type:INT16 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"j" column_length:6 charset:63 flags:49153
name:"k" type:DECIMAL table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"k" column_length:16 charset:63 decimals:4 flags:32768
name:"l" type:UINT32 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"l" column_length:10 charset:63 flags:49184
]

Binary version

v8.0.0
7e09d0c

keweishang commented 3 years ago

Maybe @rohit-nayak-ps has some idea of the error? Seems related to schema versioning.

keweishang commented 3 years ago

Also during a long-running gh-ost job, if I create a new vstream gRPC client that subscribes to the current vgtid (gtid set to current), the vstream gRPC client receives an error:

UNKNOWN: target: test_sharded_keyspace.-80.replica, used tablet: zoneA-201 (prelive-ib-tablet-201.vt): vttablet: rpc error: code = Unknown desc = stream (at source tablet) error @ faa85f08-2c16-11eb-ac78-060146dd04aa:1-2930879,facea7d1-2c16-11eb-8519-0243e15aa530:1: unknown table _19344c31_30ce_11eb_98d2_029e8414c92c_20201127163231_gho in schema

rohit-nayak-ps commented 3 years ago

I tried reproducing this with both sharded and unsharded keyspaces with the table being continuously populated with data so that the gh-ost alter table is long-running. No luck.

I also notice that there are some changes in the gh-ost based functionality. Not sure if it impacts the bug you are encountering but was wondering if it is possible for you to check if the same problem happens on master.

The latest code changes the way you invoke gh-ost like:

ApplySchema -ddl_strategy "gh-ost" -sql "ALTER table customer add column x1 int default 0" customer

I know you mentioned this only happens with huge tables, I was testing with ~100k. I will start populating one locally for testing this ...

keweishang commented 3 years ago

@rohit-nayak-ps thanks for replying.

I managed to reproduce the following error (unknown table error) every time locally (table size 500k rows), with GA v8.0.0.

Also during a long-running gh-ost job, if I create a new vstream gRPC client that subscribes to the current vgtid (gtid set to current), the vstream gRPC client receives an error:
UNKNOWN: target: test_sharded_keyspace.-80.replica, used tablet: zoneA-201 (prelive-ib-tablet-201.vt): vttablet: rpc error: code = Unknown desc = stream (at source tablet) error @ faa85f08-2c16-11eb-ac78-060146dd04aa:1-2930879,facea7d1-2c16-11eb-8519-0243e15aa530:1: unknown table _19344c31_30ce_11eb_98d2_029e8414c92c_20201127163231_gho in schema

~Interestingly, this error only happens if my vstream client subscribes to the REPLICA tablet type, if I switch to subscribing to the MASTER tablet type, there is no problem so far.~ I actually managed to reproduce this error with vstream subscribing to both MASTER tablet type and REPLICA tablet type.

keweishang commented 3 years ago

Add @rgibaiev to follow this issue as well.

keweishang commented 3 years ago

I also managed to reproduce the following error (columns and values mismatch error) repeatatively with GA v8.0.0:

target: test_sharded_keyspace.80-.master, used tablet: zone1-300 (0297c7837f92): vttablet: rpc error: code = Unknown desc = stream (at source tablet) error @ 1ba0bed0-3332-11eb-b9d0-0242ac110002:1-1045: cannot determine table columns for bar_entry: event has [8 254 17 17 8 8 8 15 246 254 246 1 2 246 3 3], schema as [...basically one less column...]

The steps are:

Have the same vschema and schema definition as in the issue description in a sharded keyspace. Have VStream client subscribe to the current position of the REPLICA tablet type of the sharded keyspace.
Insert 500k rows in 500 transactions, each transaction has 1k rows. VStream client receives 500k rows.
Start gh-ost by running vtctlclient -server localhost:15999 ApplySchema -sql "ALTER WITH 'gh-ost' TABLE bar_entry add column status int" test_sharded_keyspace
While gh-ost job is in running status for all shards, start a 2nd VStream client subscribing to the current position of the REPLICA tablet type of the keyspace, start a 3rd VStream client subscribing to the current position of the MASTER tablet type of the keyspace. There is a chance that one of the 2 VStream clients would fail with the "unknown table" error.
As soon as the gh-ost job is in complete status for all shards, within maybe 20 seconds. Start a 4th VStream client subscribing to the current position of the REPLICA tablet type of the keyspace, start a 5th VStream client subscribing to the current position of the MASTER tablet type of the keyspace. Insert a few rows to the table, there is a chance that the VStream client who subscribes to the MASTER tablet type would fail with the "columns and values mismatch" error.
After 1-2 minutes, if you repeat step 5, there will be no errors. It seems that schema versioning has stablised.

I'll test the master branch tomorrow, as you suggested @rohit-nayak-ps

rohit-nayak-ps commented 3 years ago

@keweishang, I was able to repro on master branch as well, so no need to test on it! For me too I was able to get it only while pointing to replica and not to master, but it might be a race. As you suspected, the schema is not getting reloaded correctly by vstreamer after the gh-ost operation completes. Sugu suggested gh-ost might be explicitly reloading schema on master (where gh-ost runs), so we don't see the error there. Will let you know once we have progress.

keweishang commented 3 years ago

@rohit-nayak-ps, it's great that you can reproduce the errors on your side now. For me, both the "unknown table" and the "column mismatch" errors also happened when pointing vstream to master as well.

Sure. Keep me updated here and let me know if you need any further information.

shlomi-noach commented 3 years ago

@rohit-nayak-ps happy to look into reloading schema after gh-ost (or pt-osc) complete.

shlomi-noach commented 3 years ago

Quick update from discussing with @rohit-nayak-ps , we will seek a way to trigger ReloadSchemaShard (or equivalent) from the TabletServer that runs the migration. Issue is that we need to invoke the reload not only on the master (almost trivial) but on all shard tablets; the TabletServer doesn't have the gRPC mechanism to communicate directly to other tablets.

keweishang commented 3 years ago

Thanks for the update. @shlomi-noach So you meant ReloadSchemaShard needs to be triggered in all 3 TabletServers (in case of a shard having 1 MASTER Tablet, 1 REPLICA Tablet, and 1 RDONLY Tablet) in the shard? Do you have some potential solutions in mind? For example, if there's no gRPC between TabletServers, can they communicate via etcd?

shlomi-noach commented 3 years ago

So you meant ReloadSchemaShard needs to be triggered in all 3 TabletServers

Yes, assuming I understand correctly; specifically, we need to reload on the replica where vstream runs on.

@rohit-nayak-ps has a workaround meanwhile, I'll update soon.

rohit-nayak-ps commented 3 years ago

The workarounds I had discussed (while we wait for an automatic schema load post-migration) are:

vtctl ReloadSchemaKeyspace <keyspace> to be manually run on the command line which forces all tablets in that keyspace to do a schema reload.
Run a tracker which runs a vstream for schema tracking (which as a side-effect reloads the keyspace when it encounters a DDL). Since you are already running vstreams this does not apply. In any case, as I mentioned in a previous comment, there seems to be a bug where vstreams are NOT reloading the schema when a gh-ost rename occurs. Hope to make progress on this tomorrow.

rohit-nayak-ps commented 3 years ago

Sorry, but I am not able to repro this anymore. I have been testing for a while now using this setup:

Ran the local example to the end except the 401 teardown script, so that the resharded customer keyspace is live
Populated the customer table with ~15 million rows, using the populate.go script in the gist https://gist.github.com/rohit-nayak-ps/f8356ea2b9862b1d8cb1c3f2266265ec
Ran multiple gh-ost migrations adding new columns/altering column types like: ApplySchema -ddl_strategy "gh-ost" -sql "alter table customer add x1 int not null default 0" customer
While gh-ost was running, ran three instances of the vstream_client.go script
The vstream clients did not error out while running continuously or if started again either with current or from the beginning

Not sure how this is different from your setup. Since it is happening consistently for you @keweishang, it will be nice if you can repro using the same setup with any mods to recreate the bug, since it is then easier for any of us here to debug. I am running this on the current master (though I don't think we have new code that could have fixed this error).

keweishang commented 3 years ago

Hi @rohit-nayak-ps, sure, I'll try and use your setup to reproduce the error. Will keep you updated this week.

keweishang commented 3 years ago

Hi @rohit-nayak-ps, sorry for the delay. Based on GA 8.0.0 docker image, I can repetitively reproduce the errors. I've created a public repo with README that has the steps to reproduce the errors: https://github.com/keweishang/schema_reload_error_test

Let me know if you manage to reproduce the error with the above repo setup. Thanks.

rohit-nayak-ps commented 3 years ago

@keweishang , thanks for the great test repo. I was able to reproduce the "cannot determine table columns" issue, even with the latest code. The issue with the internal tables created by gh-ost has been resolved in #7159, so it doesn't appear now.

The cause is:

when Online DDL completes migration and it renames the intermediate table, the vttablet's schema is still the old one
schema is currently only reloaded: 1. by running the tracker, which sees the rename OR 2. when one or more vstreamers that are active on vttablets (due to clients invoking the VStream API) see the rename OR 3. vttablet does a periodic reload (default: 30m) OR 4. user manually reloads the schema using vtctlclient ReloadSchemaKeyspace

The default is to not run the tracker, so #1 doesn't apply. When #2 is also not applicable, ie when we call VStream API only after the migration is complete, we are then dependent on #3, vttablet's automatic reload. #4 is impractical for production use.

In our case the VStream API is called, with gtid set to "current", before the periodic upload, The schema is then not in sync. This results in the schema-mismatch error that is thrown.

We discussed reloading the schema once Online DDL completes a migration. However we need to resolve a couple of things before we can do that

we need a mechanism for vttablets to request a schema reloads from all tablets in a keyspace
we also need to address other ways that a schema can be changed without vttablet being aware: not sure of all these cases, possibly: network partitions, manually running a ddl on the underlying MySQL server, backup restore etc.

So this requires more thought and will not happen in the short-term.

The recommended way, at this time, is to enable the tracker in vttablet using -track_schema_versions=true

rohit-nayak-ps commented 3 years ago

Also, the reason I was unable to consistently repro earlier was that the tablets always had vstreams running on them which were reloading the schema.So a fresh VStream API client always found the updated schema.

keweishang commented 3 years ago

@rohit-nayak-ps thanks for the update.

First of all, I really appreciate your explanation. Also good work in finding and fixing the issue with the internal tables (#7159).

Thanks for letting me know that having vstream running on the tablets is essential in reloading the schema of the tablet. In my case, all VStream API had failed due to #7159. No tracker was enabled by -track_schema_versions=true either, so no schema reload.

Will enabling tracker with -track_schema_versions=true on all vttablets adds any perceivable overhead? Why isn't it a default vttablet configuration?

rohit-nayak-ps commented 3 years ago

Will enabling tracker with -track_schema_versions=true on all vttablets adds any perceivable overhead? Why isn't it a default vttablet configuration?

There is an overhead of an additional vstreamer which will download the binlogs and do the minimal parsing required. Since it only deals with DDLs it is less than a regular vstream.

Whether it is perceptible depends on the server configuration and write QPS. This is precisely why we disable it by default. Originally it was enabled by default, but we had a few customers in production who were affected by it. (iirc) Those with lots of small servers + high QPS saw spikes in CPU usage when they migrated to that version.

The solution is for the tracker to be light-weight. I have done a quick POC by paring down the vstreamer functionality to a minimum and got over 60% reduction in cpu usage. To productionise it would however need a lot of testing since vstreamer would now follow different code paths based on whether it is a "lite" or regular version and vstreamers are in the core of vreplication. So it is not too high on our priority list at this moment. I will create an issue for this soon and if we find more support for it we can take it up earlier!

rohit-nayak-ps commented 3 years ago

Closing this. As discussed above, the recommended way to get around this issue is to enable the tracker in vttablet using -track_schema_versions=true

vitessio / vitess