Closed wjhuang2016 closed 5 months ago
A smaller reproduce:
CREATE TABLE
`t1` (
`col_5` json NOT NULL,
`col_7` date DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = utf8 COLLATE = utf8_bin;
INSERT INTO
`t1`
VALUES
(
'[2]',
'2008-11-29'
);
CREATE TABLE
`t2` (
`col_5` json NOT NULL,
`col_7` date DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = utf8 COLLATE = utf8_bin;
INSERT INTO
`t2`
VALUES
(
'[2]',
'2006-11-16'
);
WITH
`cte_3600` (`col_10746`) AS (
SELECT
EXPORT_SET (
3,
`t2`.`col_5`,
`t2`.`col_7`,
'-',
8
) AS `r0`
FROM
`t1`
JOIN `t2` ON `t1`.`col_5` = `t2`.`col_5`
) (
SELECT
1
FROM
`cte_3600`
WHERE
`cte_3600`.`col_10746` BETWEEN 'Wz%vodzG3' AND 'm*RzhE#Kif(H'
);
I think the issue is that the internal expression
SELECT
EXPORT_SET (
3,
`t2`.`col_5`,
`t2`.`col_7`,
'-',
8
) AS `r0`
FROM
`t1`
JOIN `t2` ON `t1`.`col_5` = `t2`.`col_5`
The r0
should be compared with others through binary
collation;
select '[2]-[2]-2006-11-16-2006-11-16-2006-11-16-2006-11-16-2006-11-16-2006-11-16' > 'W';
select _binary '[2]-[2]-2006-11-16-2006-11-16-2006-11-16-2006-11-16-2006-11-16-2006-11-16' > 'W';
A smaller reproduce:
SELECT EXPORT_SET (3,cast('[]' as json),'2','-',8) BETWEEN 'W' AND 'm';
It gives 1
in MySQL, and it give 0
in TiDB. EXPORT_SET
(in MySQL) returns LONG_BLOB
in this case. It seems to be some magic in mysql type infer.
MySQL has a strange function to set the string type:
/**
Set the Item to be variable length string. Actual type is determined from
maximum string size. Collation must have been set before calling function.
@param max_l Maximum number of characters in string
*/
inline void set_data_type_string(uint32 max_l) {
max_length = max_l * collation.collation->mbmaxlen;
decimals = DECIMAL_NOT_SPECIFIED;
if (max_length <= Field::MAX_VARCHAR_WIDTH)
set_data_type(MYSQL_TYPE_VARCHAR);
else if (max_length <= Field::MAX_MEDIUM_BLOB_WIDTH)
set_data_type(MYSQL_TYPE_MEDIUM_BLOB);
else
set_data_type(MYSQL_TYPE_LONG_BLOB);
}
The type of EXPORT_SET
is determined by
bool Item_func_export_set::resolve_type(THD *thd) {
if (param_type_is_default(thd, 0, 1, MYSQL_TYPE_LONGLONG)) return true;
if (param_type_is_default(thd, 1, 4)) return true;
if (param_type_is_default(thd, 4, 5, MYSQL_TYPE_LONGLONG)) return true;
if (agg_arg_charsets_for_string_result(collation, args + 1,
min(4U, arg_count) - 1))
return true;
const ulonglong length = max(args[1]->max_char_length(collation.collation),
args[2]->max_char_length(collation.collation));
const ulonglong sep_length =
(arg_count > 3 ? args[3]->max_char_length(collation.collation) : 1);
set_data_type_string(length * 64U + sep_length * 63U);
set_nullable(is_nullable() || max_length > thd->variables.max_allowed_packet);
return false;
}
The problem is that, the args[1]->max_char_length(collation.collation)
will always return Field::MAX_LONG_BLOB_WIDTH
for json. See:
/**
Set the data type of the Item to be JSON.
*/
void set_data_type_json() {
set_data_type(MYSQL_TYPE_JSON);
collation.set(&my_charset_utf8mb4_bin, DERIVATION_IMPLICIT);
decimals = DECIMAL_NOT_SPECIFIED;
max_length = Field::MAX_LONG_BLOB_WIDTH;
}
As TiDB didn't have a unified MaxLength
for each field type (and the cast
function), it's not simple to have similar logic here.
Other incompatible cases:
select concat(cast('[]' as json), '[]') between 'W' AND 'm';
A lot of functions also have similar problem: concat
, concat_ws
, replace
, insert
, substr
, soundex
, format
, elt
, make_set
, repeat
, rpad
, lpad
, weight_string
, hex
, unhex
, convert(...)
, export_set
, quote
, compress
, date_format
, lower
, upper
, reverse
, coalesce
....
https://github.com/pingcap/tidb/pull/53126 Fixed most of the situations. Not all functions mentioned above is included in this PR, because:
rpad
and lpad
determines the max_length
according to the second argument if it's constant. This PR didn't fix the issue for this case (and I'll try to fix it in the future).compress
), which don't need fix.hex/unhex
didn't calculate the max_length
according to its first argument when its first argument is not string.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
MySQL:
3. What did you see instead (Required)
TiDB
If disabled inlining CTE by referring cte_3600 one more time.
4. What is your TiDB version? (Required)
cfbabfa705dbbc97c8e1e00af7f2d2620406304c