pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.11k stars 5.83k forks source link

Alter table exchange partition does not work if table has tiflash replica #46077

Open tonyhu214 opened 1 year ago

tonyhu214 commented 1 year ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

MySQL [test]> show create table a; +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | a | CREATE TABLE a ( id int(11) NOT NULL, fname varchar(30) DEFAULT NULL, lname varchar(30) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin PARTITION BY RANGE (id) (PARTITION p0 VALUES LESS THAN (50), PARTITION p1 VALUES LESS THAN (100), PARTITION p2 VALUES LESS THAN (150)) | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

MySQL [test]> show create table a_tmp; +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | a_tmp | CREATE TABLE a_tmp ( id int(11) NOT NULL, fname varchar(30) DEFAULT NULL, lname varchar(30) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

MySQL [test]> select * from a; +-----+-------+-------+ | id | fname | lname | +-----+-------+-------+ | 76 | Mary | Jones | | 16 | Frank | White | | 120 | Linda | Black | +-----+-------+-------+ 3 rows in set (0.00 sec)

MySQL [test]> select * from a_tmp; Empty set (0.00 sec)

MySQL [test]> alter table a EXCHANGE PARTITION p0 WITH TABLE a_tmp; ERROR 1736 (HY000): Tables have different definitions MySQL [test]>

2. What did you expect to see? (Required)

Exchange partition success.

3. What did you see instead (Required)

Exchange partition failed.

4. What is your TiDB version? (Required)

MySQL [test]> select tidb_version() -> ; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tidb_version() | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Release Version: v6.5.0 Edition: Community Git Commit Hash: 706c3fa3c526cdba5b3e9f066b1a568fb96c56e3 Git Branch: heads/refs/tags/v6.5.0 UTC Build Time: 2022-12-27 03:50:44 GoVersion: go1.19.3 Race Enabled: false TiKV Min Version: 6.2.0-alpha Check Table Before Drop: false Store: tikv | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

seiya-annie commented 1 year ago

in v7.4.0 master

mysql> CREATE TABLE a (
    -> id int(11) NOT NULL,
    -> fname varchar(30) DEFAULT NULL,
    -> lname varchar(30) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
    -> PARTITION BY RANGE (id)
    -> (PARTITION p0 VALUES LESS THAN (50),
    -> PARTITION p1 VALUES LESS THAN (100),
    -> PARTITION p2 VALUES LESS THAN (150));
Query OK, 0 rows affected (0.11 sec)

mysql> CREATE TABLE a_tmp (
    -> id int(11) NOT NULL,
    -> fname varchar(30) DEFAULT NULL,
    -> lname varchar(30) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Query OK, 0 rows affected (0.10 sec)

mysql> alter table a set tiflash replica 1;
Query OK, 0 rows affected (0.12 sec)

mysql> alter table a_tmp set tiflash replica 1;
Query OK, 0 rows affected (0.11 sec)

mysql> alter table a EXCHANGE PARTITION p0 WITH TABLE a_tmp;
ERROR 1105 (HY000): rpc error: code = Unavailable desc = error reading from server: EOF
mysql> alter table a EXCHANGE PARTITION p0 WITH TABLE a_tmp;
Query OK, 0 rows affected, 1 warning (6.93 sec)
seiya-annie commented 1 year ago

@tonyhu214 Please provide your TiFlash replica definition.

seiya-annie commented 1 year ago

in 6.5.0 1tikv/1tidb/1pd cluster

[tidb@localhost bin]$ ./tidb-server -V
Release Version: v6.5.0
Edition: Community
Git Commit Hash: 706c3fa3c526cdba5b3e9f066b1a568fb96c56e3
Git Branch: heads/refs/tags/v6.5.0
UTC Build Time: 2022-12-27 03:50:44
GoVersion: go1.19.3
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: unistore
mysql> CREATE TABLE a (
    -> id int(11) NOT NULL,
    -> fname varchar(30) DEFAULT NULL,
    -> lname varchar(30) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
    -> PARTITION BY RANGE (id)
    -> (PARTITION p0 VALUES LESS THAN (50),
    -> PARTITION p1 VALUES LESS THAN (100),
    -> PARTITION p2 VALUES LESS THAN (150)) ;
Query OK, 0 rows affected (0.10 sec)

mysql> CREATE TABLE a_tmp (
    -> id int(11) NOT NULL,
    -> fname varchar(30) DEFAULT NULL,
    -> lname varchar(30) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Query OK, 0 rows affected (0.10 sec)

mysql> alter table a set tiflash replica 1;
Query OK, 0 rows affected (0.10 sec)

mysql> alter table a_tmp set tiflash replica 1;
Query OK, 0 rows affected (0.11 sec)

mysql> alter table a EXCHANGE PARTITION p0 WITH TABLE a_tmp;
Query OK, 0 rows affected, 1 warning (3.99 sec)

mysql> 
tonyhu214 commented 1 year ago

@seiya-annie Thx for your help! But there might be a situation where only a portion of the data in the partitioned table requires the assistance of TiFlash. Hope that TiDB will support the creation of TiFlash replication based on partitions in the future

mjonss commented 1 year ago

@seiya-annie Thx for your help! But there might be a situation where only a portion of the data in the partitioned table requires the assistance of TiFlash. Hope that TiDB will support the creation of TiFlash replication based on partitions in the future

Hi @tonyhu214 , Currently all or none partitions needs to be in TiFlash, since the optimizer works on table level. It cannot currently do different plans for different partitions. I assume you want TiDB to support having only some partitions replicated into TiFlash, and allow EXCHANGE Partition to handle exchanging tables and partitions which does not have the same amount of TiFlash Replicas? Can you please tell us a bit more about the use case for this?

@seiya-annie the current master has an issue/regression from https://github.com/pingcap/tidb/pull/45877 which I'm working on in https://github.com/pingcap/tidb/pull/46126.

tonyhu214 commented 1 year ago

Hi @mjonss , Real-time statistics are required on purchases made by members in a specific city within the past three months. The member user table contains millions of records and is partitioned based on the "city" field using hash partitioning. Due to business requirements, the order table retains data for the past 2 years . So it is partitioned based on the "time" field using range partitioning and using Placement Rules to separate hot and cold data. Cold data is rarely accessed. I hope TiFlash can help reduce the response time of analytical query based on the hot data.

mjonss commented 1 year ago

Hi @mjonss , Real-time statistics are required on purchases made by members in a specific city within the past three months. The member user table contains millions of records and is partitioned based on the "city" field using hash partitioning. Due to business requirements, the order table retains data for the past 2 years . So it is partitioned based on the "time" field using range partitioning and using Placement Rules to separate hot and cold data. Cold data is rarely accessed. I hope TiFlash can help reduce the response time of analytical query based on the hot data.

@tonyhu214 Could you try to have all partitions in TiFlash, and compare the latency/query time? Normally TiFlash is compressing the data even more than TiKV/RocksDB, so it may be possible for at least a test? Then we could calculate the saved space and resources on TiFlash, and compare with the saved space in TiFlash (assuming there is not much other resources to save for cold data).

As you probably know, you can force the optimizer to chose between TiKV and TiFlash by using the manual hint, /*+ read_from_storage(tiflash[table_name]) */.

In case we allow only some partitions in TiFlash, would it be an acceptable limitation to only create execution plans using TiFlash if all partitions used in the query exists in TiFlash, with equal number of replicas (or only scheduled for the concurrency of the partition with the lowest number of tiflash replicas)?