pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.41k stars 5.85k forks source link

Improved support for MySQL Shell dump and load #24515

Open dveeden opened 3 years ago

dveeden commented 3 years ago

Development Task

MySQL Shell is a relatively new tool from Oracle which is used as the new CLI for various DBA and developer tasks. One of the features of this tool is dumping and loading data. This is somewhat similar to mysqldump and mysqlpump, but more focus is placed on parallel dumping and restoring to get better performance. This also adds Oracle Cloud features for storing and loading umps.

Dumping data from TiDB 5.0

[dvaneeden@dve-carbon ~]$ mysqlsh mysql://root@localhost:4000 -- util dump-schemas employees --output-url=/tmp/mysqlsh_dump_employees
Acquiring global read lock
NOTE: Error acquiring global read lock: MySQL Error 1105 (HY000): FLUSH TABLES WITH READ LOCK is not supported.  Please use @@tidb_snapshot
Global read lock has been released
ERROR: Unable to acquire global read lock: MySQL Error 1105 (HY000): FLUSH TABLES WITH READ LOCK is not supported.  Please use @@tidb_snapshot
[dvaneeden@dve-carbon ~]$ mysqlsh mysql://root@localhost:4000 -- util dump-schemas employees --output-url=/tmp/mysqlsh_dump_employees --consistent=false
WARNING: Failed to fetch value of @@GLOBAL.GTID_EXECUTED.
Gathering information - done
WARNING: The dumped value of gtid_executed is not guaranteed to be consistent
ERROR: User 'root'@'%' is missing the following privilege(s) for table `employees`.`salaries`: TRIGGER.
[dvaneeden@dve-carbon ~]$ mysqlsh mysql://root@localhost:4000 -- util dump-schemas employees --output-url=/tmp/mysqlsh_dump_employees --consistent=false --triggers=false
WARNING: Failed to fetch value of @@GLOBAL.GTID_EXECUTED.
Gathering information - done
WARNING: The dumped value of gtid_executed is not guaranteed to be consistent
Writing global DDL files
Writing DDL for schema `employees`
Writing DDL for view `employees`.`current_dept_emp`
Writing DDL for view `employees`.`dept_emp_latest_date`
Preparing data dump for table `employees`.`salaries`
Data dump for table `employees`.`salaries` will be chunked using column `emp_no`
Writing DDL for table `employees`.`salaries`
Preparing data dump for table `employees`.`titles`
Data dump for table `employees`.`titles` will be chunked using column `emp_no`
Preparing data dump for table `employees`.`dept_emp`
Data dump for table `employees`.`dept_emp` will be chunked using column `emp_no`
Preparing data dump for table `employees`.`dept_manager`
Data dump for table `employees`.`dept_manager` will be chunked using column `emp_no`
Preparing data dump for table `employees`.`departments`
Data dump for table `employees`.`departments` will be chunked using column `dept_no`
Preparing data dump for table `employees`.`employees`
Data dump for table `employees`.`employees` will be chunked using column `emp_no`
Running data dump using 4 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing DDL for table `employees`.`titles`
Writing DDL for table `employees`.`dept_emp`
Writing DDL for table `employees`.`dept_manager`
Writing DDL for table `employees`.`departments`
Writing DDL for table `employees`.`employees`
Data dump for table `employees`.`dept_manager` will be written to 1 file
Data dump for table `employees`.`dept_emp` will be written to 1 file
Data dump for table `employees`.`salaries` will be written to 1 file
Data dump for table `employees`.`titles` will be written to 1 file
Data dump for table `employees`.`departments` will be written to 1 file                                         
Data dump for table `employees`.`employees` will be written to 1 file                                           
1 thds dumping - 100% (3.92M rows / ~3.92M rows), 960.18K rows/s, 34.68 MB/s uncompressed, 9.12 MB/s compressed 
Duration: 00:00:04s                                                                                            
Schemas dumped: 1                                                                                              
Tables dumped: 6                                                                                               
Uncompressed data size: 141.50 MB                                                                              
Compressed data size: 37.19 MB                                                                                 
Compression ratio: 3.8                                                                                         
Rows written: 3919015                                                                                          
Bytes written: 37.19 MB                                                                                        
Average uncompressed throughput: 34.79 MB/s                                                                    
Average compressed throughput: 9.14 MB/s                                                                       

The issues here are:

Loading data

When loading with mysqlsh mysql://root@localhost:4000 -- util loadDump /tmp/mysqlsh_dump_employees the tool fails without a proper error message. Inspecting the TiDB tools shows that the issue is related to the isolation level support.

[2021/05/10 08:01:01.316 +02:00] [WARN] [session.go:1429] ["run statement failed"] [conn=155] [schemaVersion=138] [error="[variable:8048]The isolation level 'READ-UNCOMMITTED' is not supported. Set tidb_skip_isolation_level_check=1 to skip this error"] [session="{\n  \"currDBName\": \"employees\",\n  \"id\": 155,\n  \"status\": 2,\n  \"strictMode\": false,\n  \"user\": {\n    \"Username\": \"root\",\n    \"Hostname\": \"127.0.0.1\",\n    \"CurrentUser\": false,\n    \"AuthUsername\": \"root\",\n    \"AuthHostname\": \"%\"\n  }\n}"]

Running set global tidb_skip_isolation_level_check=1 makes the tool skip over this issue.

The next issue is with LOAD DATA LOCAL INFILE...REPLACE INTO which isn't supported by TiDB. Unfortunately this incompatibility is not listed on https://docs.pingcap.com/tidb/stable/mysql-compatibility and I don't know of a good workaround.

[dvaneeden@dve-carbon ~]$ mysqlsh mysql://root@localhost:4000 -- util loadDump /tmp/mysqlsh_dump_employees --resetProgress
Loading DDL and Data from '/tmp/mysqlsh_dump_employees' using 4 threads.
Opening dump...
Target is MySQL 5.7.25-TiDB-v5.0.1. Dump was produced from MySQL 5.7.25-TiDB-v5.0.1
NOTE: Load progress file detected for the instance but 'resetProgress' option was enabled. Load progress will be discarded and the whole dump will be reloaded.
Checking for pre-existing objects...
Executing common preamble SQL
Executing DDL script for schema `employees`
[Worker003] Executing DDL script for `employees`.`employees`
[Worker002] Executing DDL script for `employees`.`dept_manager`
[Worker001] Executing DDL script for `employees`.`dept_emp`
[Worker000] Executing DDL script for `employees`.`departments`
[Worker003] Executing DDL script for `employees`.`titles`
[Worker003] Executing DDL script for `employees`.`salaries`
[Worker000] Executing DDL script for `employees`.`current_dept_emp` (placeholder for view)
[Worker001] Executing DDL script for `employees`.`dept_emp_latest_date` (placeholder for view)
Executing DDL script for view `employees`.`dept_emp_latest_date`
Executing DDL script for view `employees`.`current_dept_emp`
ERROR: [Worker003] employees@titles@@0.tsv.zst: MySQL Error 1105 (HY000): Load Data: don't support load data with replace field: LOAD DATA LOCAL INFILE '/tmp/mysqlsh_dump_employees/employees@titles@@0.tsv.zst' REPLACE INTO TABLE `employees`.`titles` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY '    ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`emp_no`, `title`, `from_date`, `to_date`)
ERROR: [Worker001] employees@employees@@0.tsv.zst: MySQL Error 1105 (HY000): Load Data: don't support load data with replace field: LOAD DATA LOCAL INFILE '/tmp/mysqlsh_dump_employees/employees@employees@@0.tsv.zst' REPLACE INTO TABLE `employees`.`employees` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY '   ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`emp_no`, `birth_date`, `first_name`, `last_name`, `gender`, `hire_date`)
ERROR: [Worker000] employees@salaries@@0.tsv.zst: MySQL Error 1105 (HY000): Load Data: don't support load data with replace field: LOAD DATA LOCAL INFILE '/tmp/mysqlsh_dump_employees/employees@salaries@@0.tsv.zst' REPLACE INTO TABLE `employees`.`salaries` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY '  ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`emp_no`, `salary`, `from_date`, `to_date`)
ERROR: Aborting load...                                               
ERROR: [Worker002] employees@dept_emp@@0.tsv.zst: MySQL Error 1105 (HY000): Load Data: don't support load data with replace field: LOAD DATA LOCAL INFILE '/tmp/mysqlsh_dump_employees/employees@dept_emp@@0.tsv.zst' REPLACE INTO TABLE `employees`.`dept_emp` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY '  ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`emp_no`, `dept_no`, `from_date`, `to_date`)

No data loaded.
4 errors and 0 warnings messages were reported during the load.
ERROR: Error loading dump

Other compatibility issues

The TiDB logs show that the tool tries to run SELECT @@SESSION.sql_generate_invisible_primary_key which is not supported by TiDB. However as this variable was recently introduced in MySQL 8.0 the tool ignores this error.

Why compatibility is needed

Tasks

Important:

Minor and/or nice to have:

dveeden commented 3 years ago

For the TRIGGER permission:

mysql 8.0.22 > CREATE USER foo;
Query OK, 0 rows affected (0.01 sec)

mysql 8.0.22 > GRANT TRIGGER ON *.* TO foo;
Query OK, 0 rows affected (0.01 sec)

mysql 8.0.22 > SHOW GRANTS FOR foo\G
*************************** 1. row ***************************
Grants for foo@%: GRANT TRIGGER ON *.* TO `foo`@`%`
1 row in set (0.00 sec)
tidb 5.7.25-TiDB-v5.0.1 > CREATE USER foo;
Query OK, 0 rows affected (0.03 sec)

tidb 5.7.25-TiDB-v5.0.1 > GRANT TRIGGER ON *.* TO foo;
Query OK, 0 rows affected (0.03 sec)

tidb 5.7.25-TiDB-v5.0.1 > SHOW GRANTS FOR foo\G
*************************** 1. row ***************************
Grants for foo@%: GRANT Trigger ON *.* TO 'foo'@'%'
1 row in set (0.00 sec)

It may be comparing TRIGGER against Trigger, but I'm not sure yet

aioas commented 2 years ago

When will LOAD DATA LOCAL INFILE ... REPLACE INTO be supported? This is very important to us!!!

dveeden commented 2 years ago

When will LOAD DATA LOCAL INFILE ... REPLACE INTO be supported? This is very important to us!!!

@aioas How are you using or planning to use LOAD DATA LOCAL INFILE ... REPLACE INTO? any specific usecase? With MySQL Shell dump and load or something else? Information like this can help with prioritization and also to make sure we fully cover that usecase.

zn9f commented 2 years ago

When will LOAD DATA LOCAL INFILE ... REPLACE INTO be supported? This is very important to us!!!

@aioas How are you using or planning to use LOAD DATA LOCAL INFILE ... REPLACE INTO? any specific usecase? With MySQL Shell dump and load or something else? Information like this can help with prioritization and also to make sure we fully cover that usecase.

For example, in a scenario, I use load data to import 10,000 pieces of data into TiDB, but it fails when the 1000th item is imported. In order to ensure the consistency of the data, I need to import these 10,000 pieces of data from the beginning. I don’t want to delete the imported data through programs or manually. At this time, I need to use load data...replace... to ensure the correctness of the repeated imported data.

aioas commented 2 years ago

When will LOAD DATA LOCAL INFILE ... REPLACE INTO be supported? This is very important to us!!!

@aioas How are you using or planning to use LOAD DATA LOCAL INFILE ... REPLACE INTO? any specific usecase? With MySQL Shell dump and load or something else? Information like this can help with prioritization and also to make sure we fully cover that usecase.

We have a large number of offline computing scenarios of big data, and regularly update user data to the latest, new users are inserted, and old users are updated. The most suitable use is LOAD DATA LOCAL INFILE ... REPLACE INTO. Currently, there is no better alternative, so we really hope to support it.

lance6716 commented 1 year ago

LOAD DATA ... REPLACE INTO should be supported in https://github.com/pingcap/tidb/pull/41947 . Welcome to test it in the coming release