Closed tangenta closed 3 years ago
[REVIEW NOTIFICATION]
This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer
in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer
in the comment to help you merge this pull request.
The full list of commands accepted by this bot can be found here.
https://github.com/pingcap/parser/blob/562fed23b4fb6fffe012f8c3d46d8adf5c3ac744/lexer.go#L900-L903
peek
is widely used in lexer
, such as incAsLongAs
. If the SQL is gbk
encoding, there may be an error to decode it as utf8
encoding?
@xiongjiwei If there is an utf8 decoding error, the bytes are interpreted into a four-byte integer. We can still decode it into gbk later because the information is not lost.
OK.
please include unit tests for:
This should pass
-- in GBK encoding
select '芢' from `玚`;
Equivalently as Go string
// charset=gbk
"select '\xc6\x5c' from `\xab\x60`;"
$ echo $'select "\xc6\x5c" from `\xab\x60`;' | mysql -u root test --default-character-set=gbk
芢
芢
This should fail
// charset=utf8mb4
"select _gbk'\xc6\x5c' from dual;"
$ echo $'select _gbk"\xc6\x5c" from dual;' | mysql -u root test --default-character-set=utf8mb4
ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '"?\" from dual' at line 1
var utf8SQL = "create table 测试表 (测试列 varchar(255) default 'GBK测试用例测试用例测试用例测试用例测试用例测试用例测试用例测试用例');"
var gbkSQL string
func init() {
encoding, _ := charset.Lookup("gbk")
gbkSQL, _ = encoding.NewEncoder().String(utf8SQL)
}
func BenchmarkParserParseGBK(b *testing.B) {
p := New()
for i := 0; i < b.N; i++ {
_, _, _ = p.Parse(gbkSQL, "gbk", "")
}
}
func BenchmarkParserParseUTF8(b *testing.B) {
p := New()
for i := 0; i < b.N; i++ {
_, _, _ = p.Parse(utf8SQL, "", "")
}
}
goos: linux
goarch: amd64
pkg: github.com/pingcap/parser
cpu: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
BenchmarkParserParseGBK-12 50840 23212 ns/op
BenchmarkParserParseUTF8-12 81193 14864 ns/op
PASS
ok github.com/pingcap/parser 2.799s
/hold
/unhold
@xiongjiwei Do you need another look?
/merge
This pull request has been accepted and is ready to merge.
why is the Circle CI requirement still there?
/merge
This pull request has been accepted and is ready to merge.
What problem does this PR solve?
Related to https://github.com/pingcap/tidb/issues/26812
What is changed and how it works?
Add field
encoding.Decoder
to the parser config.Check List
Tests
Code changes
Side effects
NA
Related changes
NA