pingcap / tidb-tools

tidb-tools are some useful tool collections for TiDB.
Apache License 2.0
286 stars 191 forks source link

sync-diff-inspector: use UNHEX for BINARY and VARBINARY types #746

Open taowata opened 11 months ago

taowata commented 11 months ago

Feature Request

Is your feature request related to a problem? Please describe: Using sync-diff-inspector with tables containing BINARY type columns outputs unexecutable SQLs due to direct binary values.

Here is an example of such a query:

REPLACE INTO `table_name`(...,`binary_column`, ...) VALUES (...,'ßä���<95>ko^P��«', ...);

Describe the feature you'd like: My proposal is that for BINARY and VARBINARY typed columns, the UNHEX function should be used by default. This change could allow the generated fix SQLs to be executable as is.

The improved query would look something like this:

REPLACE INTO `table_name`(...,`binary_column`, ...) VALUES (...,UNHEX('hexadecimal representation'), ...);

Alternatively, like the --hex-blob option in mysqldump, a new item could be added to the config file to enable this behavior.

Describe alternatives you've considered: Currently, users have to manually adjust the generated queries which is very cumbersome in bulk operations. Consideration of alternatives leads me back to the two proposals I made above: employing the UNHEX function in the sync-diff-inspector, or adding a new config file item.

Teachability, Documentation, Adoption, Migration Strategy:

Implementing this feature will allow users to execute generated queries directly, even on tables with binary type columns. I have already successfully implemented the use of the UNHEX function by default in my own project, and it worked effectively for resolving differences. If consensus is reached on this approach, I am ready to promptly submit a PR with the necessary changes.

It could make the user experience smoother for those who use BINARY and VARBINARY types.

dveeden commented 3 months ago

Might be good to use a hex literal (0x616263) instead of using a function (UNHEX('616263')).