Closed linhantao closed 2 years ago
@linhantao Thanks for your feedback, can you add some information about the table structure and the steps to reproduce?
@jingshanglu repeat the steps as follows: 1、create one table :
create table test_json (id varchar(100), 家庭住址 json,other varchar(100));
2、insert one data:
insert into test_json values('1','{\"detail\": \"\", \"list\": [{\"id\": \"41bb58cab2954a768b7a2e85e82ef369\", \"value\": \"湖北省\"}, {\"id\": \"0dbbcdb6b6524f5fb47585e49fda44be\", \"value\": \"武汉市\"}, {\"id\": \"0e6f75f3bb1e46b8a329fc9aa62187f4\", \"value\": \"东西湖区\"}]}','其他数据');
3、error sql:
select from test_json A left join (select from test_json where false) B on A.家庭住址 =B.家庭住址 where B.家庭住址 ->'$.list[0].id'='0asdsss'
4、 success sql:
select from test_json A left join (select from test_json where false) B on A.家庭住址 =B.家庭住址 where A.家庭住址 ->'$.list[0].id'='0asdsss'
select from test_json A left join (select from test_json where false) B on A.家庭住址 =B.家庭住址 where B.家庭住址 ->'$.list[0].id'='122'
There is something wrong when parsing binary from json str
[2022/05/26 02:39:06.733 +00:00] [INFO] [conn.go:1149] ["command dispatched failed"] [conn=1396212641508098479] [connInfo="id:1396212641508098479, addr:172.26.0.1:60152 status:10, collation:utf8_general_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="select * from test_json A left join (select * from test_json where false) B on A.addr =B.addr where B.addr ->'$.list[0].id'='0asdsss'"] [txn_mode=PESSIMISTIC] [timestamp=433463688357478401] [err="[json:3140]Invalid JSON text: The document root must not be followed by other values.
github.com/pingcap/errors.AddStack
/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174
github.com/pingcap/errors.(*Error).GenWithStackByArgs
/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:164
github.com/pingcap/tidb/types/json.ParseBinaryFromString
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/types/json/binary.go:403
github.com/pingcap/tidb/expression.(*builtinCastStringAsJSONSig).vecEvalJSON
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/builtin_cast_vec.go:795
github.com/pingcap/tidb/expression.(*ScalarFunction).VecEvalJSON
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/scalar_function.go:78
github.com/pingcap/tidb/expression.(*builtinEQJSONSig).vecEvalInt
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/builtin_compare_vec_generated.go:1124
github.com/pingcap/tidb/expression.(*ScalarFunction).VecEvalInt
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/scalar_function.go:48
github.com/pingcap/tidb/expression.EvalExpr
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/expression.go:566
github.com/pingcap/tidb/expression.VecEvalBool
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/expression.go:361
github.com/pingcap/tidb/expression.vectorizedFilter
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/chunk_executor.go:489
github.com/pingcap/tidb/expression.VectorizedFilterConsiderNull
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/chunk_executor.go:401
github.com/pingcap/tidb/expression.VectorizedFilter
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/chunk_executor.go:379
github.com/pingcap/tidb/executor.(*SelectionExec).Next
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/executor.go:1420
github.com/pingcap/tidb/executor.Next
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/executor.go:319
github.com/pingcap/tidb/executor.(*recordSet).Next
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:151
github.com/pingcap/tidb/server.(*tidbResultSet).Next
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/driver_tidb.go:323
github.com/pingcap/tidb/server.(*clientConn).writeChunks
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2212
github.com/pingcap/tidb/server.(*clientConn).writeResultset
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2163
github.com/pingcap/tidb/server.(*clientConn).handleStmt
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2044
github.com/pingcap/tidb/server.(*clientConn).handleQuery
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1876
github.com/pingcap/tidb/server.(*clientConn).dispatch
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1371
github.com/pingcap/tidb/server.(*clientConn).Run
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1121
github.com/pingcap/tidb/server.(*Server).onConn
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:559
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571"]
MySQL [test]> explain select * from test_json A left join (select * from test_json where false) B on A.addr =B.addr where B.addr ->'$.list[0].id'='0asdsss';
+-------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
| Selection_12 | 0.80 | root | | eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY)) |
| └─HashJoin_13 | 1.00 | root | | left outer join, equal:[eq(test.test_json.addr, test.test_json.addr)] |
| ├─Selection_17(Build) | 0.00 | root | | not(isnull(cast(test.test_json.addr, var_string(4294967295)))) |
| │ └─TableDual_18 | 0.00 | root | | rows:0 |
| └─TableReader_16(Probe) | 1.00 | root | | data:TableFullScan_15 |
| └─TableFullScan_15 | 1.00 | cop[tikv] | table:A | keep order:false, stats:pseudo |
+-------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
But for the normal one, eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY))
is pushed down to tikv.
MySQL [test]> explain select * from test_json A left join (select * from test_json where false) B on A.addr =B.addr where A.addr ->'$.list[0].id'='0asdsss';
+-----------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_12 | 0.80 | root | | left outer join, equal:[eq(test.test_json.addr, test.test_json.addr)] |
| ├─Selection_17(Build) | 0.00 | root | | eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY)), not(isnull(cast(test.test_json.addr, var_string(4294967295)))) |
| │ └─TableDual_18 | 0.00 | root | | rows:0 |
| └─TableReader_16(Probe) | 0.80 | root | | data:Selection_15 |
| └─Selection_15 | 0.80 | cop[tikv] | | eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY)) |
| └─TableFullScan_14 | 1.00 | cop[tikv] | table:A | keep order:false, stats:pseudo |
+-----------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
minimal reproduce:
create table t(a json);
insert into t values('{"id": "ish"}');
select * from t t1 left join t t2 on t1.a=t2.a where t2.a='ish';
+--------------------------------+---------+-----------+---------------+--------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+---------+-----------+---------------+--------------------------------------------------------------+
| Selection_8 | 0.80 | root | | eq(test.t.a, cast("ish", json BINARY)) |
| └─HashJoin_9 | 1.00 | root | | left outer join, equal:[eq(test.t.a, test.t.a)] |
| ├─TableReader_15(Build) | 0.80 | root | | data:Selection_14 |
| │ └─Selection_14 | 0.80 | cop[tikv] | | not(isnull(cast(test.t.a, var_string(4294967295)))) |
| │ └─TableFullScan_13 | 1.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─TableReader_12(Probe) | 1.00 | root | | data:TableFullScan_11 |
| └─TableFullScan_11 | 1.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+--------------------------------+---------+-----------+---------------+--------------------------------------------------------------+
7 rows in set (0.00 sec)
this issue will reproduce when :
eq(some_json, cast("some_string", json))
PredicatePushDown
is triggered (usually adding a join will enable it)actually 0asdsss
or ish
is not a valid JSON, but "ish"
is a valid JSON, we can check it by executing SQL SELECT CAST('ish' AS json);
and SELECT CAST('"ish"' AS json);
.
mysql> SELECT CAST('"ish"' AS json);
+-----------------------+
| CAST('"ish"' AS json) |
+-----------------------+
| "ish" |
+-----------------------+
1 row in set (0.00 sec)
mysql> SELECT CAST('ish' AS json);
ERROR 3140 (22032): Invalid JSON text: The document root must not be followed by other values.
phase planBuilder:
eq(some_json, some_string)
, tidb will cast this string to JSON first.ParseToJSONFlag
so executor won't really parse this string.
https://github.com/pingcap/tidb/blob/master/expression/builtin_compare.go#L1741-L1751
func (c *compareFunctionClass) generateCmpSigs(ctx sessionctx.Context, args []Expression, tp types.EvalType) (sig builtinFunc, err error) {
if tp == types.ETJson {
// In compare, if we cast string to JSON, we shouldn't parse it.
for i := range args {
DisableParseJSONFlag4Expr(args[i])
}
}
phase logicalOptimize
PredicatePushDown
has a potential bug, which will unexpectedly reset the flag of expressions. https://github.com/pingcap/tidb/pull/35759 has more information about it.phase executor
builtinCastStringAsJSONSig
is working, it checks whether the ParseToJSONFlag
is set. if not set, it just do a simple binary convert. if set, it will parse ish
, so tidb report.
https://github.com/pingcap/tidb/blob/master/expression/builtin_cast_vec.go#L787-L810https://github.com/pingcap/tidb/pull/35759 will fix this issue.
suggest closing the cherry-picks of this pr, because it has a little effect and difficult trigger condition.
TIDB version : V5.1 / V5.0 error message:[Err] 3140 - Invalid JSON text: The document root must not be followed by other values.
When left join is used, JSON value filtering is performed on the right table, and an error occur, When using a right join, JSON value filtering is performed on the left table and an error occurs. for example:
When a left join is used, JSON value filtering is performed on the left table without error.Such as:
However, the values in the JSON filter after the concatenation are all numeric, so the query can be executed without letters for example: