Open sjmudd opened 7 years ago
@sjmudd Thank you for your suggestion. This is a very useful feature. We will let you know when starts to support this feature.
But we can't use the same algorithm to calculate the checksum value because we store data in a different format than MySQL, so the checksum value for the table imported from MySQL to TiDB will be different.
I stumbled across ADMIN CHECKSUM TABLE
by accident today. It looks like this is used by lightning. The output format is not CHECKSUM TABLE
compatible though:
mysql> ADMIN CHECKSUM TABLE trips;
+-----------+------------+----------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+-----------+------------+----------------------+-----------+-------------+
| bikeshare | trips | 11248831942383972982 | 19117643 | 2597048988 |
+-----------+------------+----------------------+-----------+-------------+
1 row in set (5.24 sec)
The current workaround to compare data between TiDB and MySQL data sources is to use md5sum
on a result ordered by primary key. For example:
morgo@ryzen:/mnt/evo970/data-sets/bikeshare-data$ ~/sandboxes/msb_8_0_15/use -BNe "SELECT * FROM bikeshare.trips ORDER BY trip_id" | md5sum
9d893b153e6476f148f3564d801e6ed6 -
morgo@ryzen:/mnt/evo970/data-sets/bikeshare-data$ mysql -BNe "SELECT * FROM bikeshare.trips ORDER BY trip_id" | md5sum
9d893b153e6476f148f3564d801e6ed6 -
I will leave this request open as a request for a MySQL-compatible CHECKSUM TABLE
command.
Close this issue since the feature is supported.
I will leave this request open as a request for a MySQL-compatible
CHECKSUM TABLE
command.
There are two challenges to implement a MySQL-Compatible CHECKSUM TABLE
command:
tidb-lighting
must be adapted to use the same checksum algorithm if it still uses the ADMIN CHECKSUM TABLE <table_name>
statement provided in TiDB./cc @kennytm, @IANTHEREAL
From MySQL document about innodb_checksum_algorithm
, seems the checksum default algorithm is crc32
:
Specifies how to generate and verify the checksum stored in the disk blocks of InnoDB tablespaces. crc32 is the default value as of MySQL 5.7.7.
The value innodb is backward-compatible with earlier versions of MySQL. The value crc32 uses an algorithm that is faster to compute the checksum for every modified block, and to check the checksums for each disk read. It scans blocks 32 bits at a time, which is faster than the innodb checksum algorithm, which scans blocks 8 bits at a time. The value none writes a constant value in the checksum field rather than computing a value based on the block data. The blocks in a tablespace can use a mix of old, new, and no checksum values, being updated gradually as the data is modified; once blocks in a tablespace are modified to use the crc32 algorithm, the associated tables cannot be read by earlier versions of MySQL.
I don't see how Lightning or ADMIN CHECKSUM TABLE
interferes with this feature.
ADMIN CHECKSUM TABLE
is calculated based on the encoded KV pairs, and includes contributions from the indices. CHECKSUM TABLE
is based on the SQL values and does not count indices. Therefore, Lightning will not switch to use CHECKSUM TABLE
even if it is available.
We need to investigate the user scenario in depth, and then make a decision after the research is completed.
As @kennytm notes, the use-case is not for lightning, but could be used for DM.
Because of SQL modes, timezones and character sets logical restores could corrupt data. This is true even if TiDB is ostensibly behaving correctly (i.e. it could be misconfiguration). CHECKSUM TABLE
provides a way of logically comparing two systems as having the same data.
MySQL users often use it during upgrades (MySQL <-> MySQL), for example logically dumping a 5.7 system and then restoring on 8.0 and comparing the data. I have found problematic behavior changes in MySQL this way. I can't find the bug now, but I think it was in 5.0 it started storing negative infinity values differently.
The feature request here is to implement the same algorithm as MySQL so it can be used for a MySQL <-> TiDB upgrade. That is, it could either be used by DM or a manual inspection by the DBA, who is concerned about such logical corruptions.
From MySQL document about
innodb_checksum_algorithm
, seems the checksum default algorithm iscrc32
:Specifies how to generate and verify the checksum stored in the disk blocks of InnoDB tablespaces. crc32 is the default value as of MySQL 5.7.7.
The value innodb is backward-compatible with earlier versions of MySQL. The value crc32 uses an algorithm that is faster to compute the checksum for every modified block, and to check the checksums for each disk read. It scans blocks 32 bits at a time, which is faster than the innodb checksum algorithm, which scans blocks 8 bits at a time. The value none writes a constant value in the checksum field rather than computing a value based on the block data. The blocks in a tablespace can use a mix of old, new, and no checksum values, being updated gradually as the data is modified; once blocks in a tablespace are modified to use the crc32 algorithm, the associated tables cannot be read by earlier versions of MySQL.
This is a different type of checksum (page checksums on the physical format). It is not related to CHECKSUM TABLE
. The manual does not explicitly state the format.
In Sync-diff-inspector, we have a custom checksum command based on:
SELECT bit_xor(
CAST(crc32(
concat_ws(',',
col1, col2, col3, …, colN,
concat(isnull(col1), isnull(col2), …, isnull(colN))
)
) AS UNSIGNED)
)
FROM t;
(for sure this is not a general solution because the concat_ws
result cannot exceed @@max_allowed_packet
, and also it relies on the stability of data → string conversion.)
When importing data into TiDB we want to be sure the data is correct. A convenient command in MySQL is CHECKSUM TABLE and this command is not currently supported by TiDB.
Adding it would make it easy to confirm if the data loaded in from an external source matches or not.
Example I did something like:
$ ssh mysqlhost sudo mysqldump test SomeTable | mysql -h tidbhost -P 4000 -u root -D test
MySQL 5.6:
TiDB:
So providing some sort of CHECKSUM table command would be great.