vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.73k stars 2.1k forks source link

vttablet schema engine does not detect partitioned tables #9180

Closed aquarapid closed 3 years ago

aquarapid commented 3 years ago

Current main branch:

$ vttablet -version
Version: 13.0.0-SNAPSHOT (Git revision 131148d3af branch 'main') built on Tue Nov  9 16:33:43 PST 2021 by jacques@dhoomdeskw.localdomain using go1.17.1 linux/amd64

Unsharded keyspace:

$ vtctlclient -server localhost:15999 ListAllTablets 
zone1-0000000100 unsharded 0 primary 192.168.0.134:15100 192.168.0.134:17100 [xxx] 2021-11-10T00:35:35Z
zone1-0000000101 unsharded 0 replica 192.168.0.134:15101 192.168.0.134:17101 [xxx] <null>
zone1-0000000102 unsharded 0 replica 192.168.0.134:15102 192.168.0.134:17102 [xxx] <null>
zone1-0000000103 unsharded 0 rdonly 192.168.0.134:15103 192.168.0.134:17103 [xxx] <null>

VSchema with single table (although problem occurs with empty VSchema too):

$ vtctlclient -server localhost:15999 GetVSchema unsharded
{
  "vindexes": {
    "hash": {
      "type": "hash"
    }
  },
  "tables": {
    "t1": {}
  }
}

Create a partitioned table, called t1 via vtgate:

mysql> CREATE TABLE t1 (
    ->     c1 int NOT NULL,
    ->     logdata BLOB NOT NULL,
    ->     created DATETIME NOT NULL,
    ->     PRIMARY KEY(c1, created)
    -> )
    -> PARTITION BY HASH( TO_DAYS(created) )
    -> PARTITIONS 10;
Query OK, 0 rows affected (0.06 sec)

This works fine, e.g.:

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `c1` int NOT NULL,
  `logdata` blob NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`c1`,`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY HASH (to_days(`created`))
PARTITIONS 10 */

However, if you try to do anything with vreplication against that table, it doesn't work, e.g. online DDL with the online (vreplication) strategy:

mysql> SET @@ddl_strategy='online';
Query OK, 0 rows affected (0.00 sec)

mysql> alter table t1 truncate partition p0;                            
+--------------------------------------+
| uuid                                 |
+--------------------------------------+
| 932b4e0d_41be_11ec_ab60_00259085bb84 |
+--------------------------------------+
1 row in set (0.00 sec)

Now, check the migration status:

mysql> show vitess_migrations\G
*************************** 1. row ***************************
                      id: 1
          migration_uuid: 932b4e0d_41be_11ec_ab60_00259085bb84
                keyspace: unsharded
                   shard: 0
            mysql_schema: vt_unsharded
             mysql_table: t1
     migration_statement: alter table t1 truncate partition p0
                strategy: online
                 options: 
         added_timestamp: 2021-11-09 16:38:51
     requested_timestamp: 0000-00-00 00:00:00
         ready_timestamp: 2021-11-09 16:38:52
       started_timestamp: 2021-11-09 16:38:52
      liveness_timestamp: 2021-11-09 16:38:52
     completed_timestamp: NULL
       cleanup_timestamp: NULL
        migration_status: running
                log_path: 
               artifacts: _932b4e0d_41be_11ec_ab60_00259085bb84_20211109163852_vrepl,
                 retries: 0
                  tablet: zone1-0000000100
          tablet_failure: 0
                progress: 0
       migration_context: vtgate:8d83ed00-41be-11ec-ab60-00259085bb84
              ddl_action: alter
                 message: 
             eta_seconds: -1
             rows_copied: 0
              table_rows: 0
       added_unique_keys: 0
     removed_unique_keys: 0
                log_file: 
retain_artifacts_seconds: 86400
1 row in set (0.00 sec)

It will stay in running state, and then eventually, after the 10+ minutes, time out as failed:

mysql> show vitess_migrations\G                                                                                                                                                                                                               
*************************** 1. row ***************************
                      id: 1
          migration_uuid: 932b4e0d_41be_11ec_ab60_00259085bb84
                keyspace: unsharded
                   shard: 0
            mysql_schema: vt_unsharded
             mysql_table: t1
     migration_statement: alter table t1 truncate partition p0
                strategy: online
                 options: 
         added_timestamp: 2021-11-09 16:38:51
     requested_timestamp: 0000-00-00 00:00:00
         ready_timestamp: 2021-11-09 16:38:52
       started_timestamp: 2021-11-09 16:38:52
      liveness_timestamp: 2021-11-09 16:38:52
     completed_timestamp: 2021-11-09 16:49:12
       cleanup_timestamp: NULL
        migration_status: failed
                log_path: 
               artifacts: _932b4e0d_41be_11ec_ab60_00259085bb84_20211109163852_vrepl,
                 retries: 0
                  tablet: zone1-0000000100
          tablet_failure: 0
                progress: 0
       migration_context: vtgate:8d83ed00-41be-11ec-ab60-00259085bb84
              ddl_action: alter
                 message: stale migration 932b4e0d_41be_11ec_ab60_00259085bb84: found running but indicates no liveness
             eta_seconds: -1
             rows_copied: 0
              table_rows: 0
       added_unique_keys: 0
     removed_unique_keys: 0
                log_file: 
retain_artifacts_seconds: 86400
1 row in set (0.00 sec)

Looking at the tablet logs, we see this error over and over:

E1109 16:45:59.728824  117668 dbclient.go:107] error in stream 1, retrying after 5s: vttablet: rpc error: code = Unknown desc = table t1 not found in vttablet schema

Checking /schemaz confirms that this table is not seen by the schema engine:

$ curl -s http://localhost:15100/schemaz | tail -20

<table class="gridtable">

                <tr>
                        <th>Table</th>
                        <th>Fields</th>
                        <th>Primary Key</th>
                        <th>Type</th>
                        <th>Metadata</th>
                </tr>

        <tr class="low">
                        <td>dual</td>
                        <td></td>
                        <td></td>
                        <td>none</td>
                        <td>&lt;nil&gt;&lt;nil&gt;</td>
                </tr>

</table>

i.e. t1 nowhere in sight.

It seems that the root cause of the problem is the table + size query in go/mysql/flavor_mysql.go, e.g. for MySQL 8.0 it has:

SELECT t.table_name, t.table_type, unix_timestamp(t.create_time), t.table_comment, i.file_size, i.allocated_size 
                FROM information_schema.tables t, information_schema.innodb_tablespaces i 
                WHERE t.table_schema = database() and i.name = concat(t.table_schema,'/',t.table_name)

This does not work for partitioned tables, since the innodb_tablespaces.name field for partitioned tables adds a #p#partition_name to the end of the schema + table_name string, e.g. something like:

mysql> select name from information_schema.innodb_tablespaces where name like '%t1%';
+----------------------+
| name                 |
+----------------------+
| vt_unsharded/t1#p#p0 |
| vt_unsharded/t1#p#p1 |
| vt_unsharded/t1#p#p2 |
| vt_unsharded/t1#p#p3 |
| vt_unsharded/t1#p#p4 |
| vt_unsharded/t1#p#p5 |
| vt_unsharded/t1#p#p6 |
| vt_unsharded/t1#p#p7 |
| vt_unsharded/t1#p#p8 |
| vt_unsharded/t1#p#p9 |
+----------------------+

instead of just vt_unsharded/t1 which the query expects.

aquarapid commented 3 years ago

To be clear, this will also affect doing a vreplication (MoveTables) migration from a source DB with partitioned tables.