trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.34k stars 2.98k forks source link

[BUG] when oracle's charset is ZHS16GBK, “SELECT * FROM TABLE where a IN XXX” sql get error results #20234

Open lico2013 opened 9 months ago

lico2013 commented 9 months ago

Steps to reproduce:

oracle db's charset is : ZHS16GBK

create TABLE

CREATE TABLE CITY (
    ID VARCHAR2(100) NULL,
    CITY_NAME VARCHAR2(100) NULL
);

table data: city.csv

when i run sql:

  select * from oracle.CCC.CITY where city_name = '崇州市'

it get the right answer as below:

img_v3_026i_a5fa76eb-f841-4c03-aa15-fc74dd788f7g

but when i run sql:

  select * from oracle.CCC.CITY c where city_name in ('崇州市','西昌市','都江堰市','彭州市','江油市','什邡市','广汉市','绵竹市','华蓥市','峨眉山市','万源市','简阳市','康定市','马尔康市','隆昌市','射洪市','会理市','连州市','乐昌市','南雄市','普宁市','陆丰市','恩平市','台山市','开平市','鹤山市','四会市','罗定市','化州市','高州市','吴川市','廉江市','津市市','韶山市','武冈市')

it get no result to return, this is obvious not the right answer,at least '崇州市' must in result。

img_v3_026i_f12cc308-78df-4a48-b595-2da070d2353g

analysis:

the preparing query is :

    2023-12-27T18:13:21.443+0800    DEBUG   20231227_101321_00020_7ut74.0.0.0-0-117 io.trino.plugin.jdbc.DefaultQueryBuilder     
  Preparing query: SELECT "ID", "CITY_NAME" FROM "CCC"."CITY" WHERE ("CITY_NAME" >= ? AND "CITY_NAME" <= ?)

this actual sql run in oracle is :

      SELECT ID, CITY_NAME FROM "CCC"."CITY" WHERE ("CITY_NAME" >= '万源市' AND "CITY_NAME" <= '鹤山市' )

it seem's oracle predicate pushdown error ,when charset is in gbk,

img_v3_026i_7df8ad13-8365-4571-af77-ab0d6d2e520g

in this code, slice sorted in utf8, this range differs from slice sorted in gbk。so when the range is error。select get the wrong answer。

  in utf8's sort:   '万源市' < '鹤山市' 
  but in gbk's sort:   '鹤山市' <  '万源市'
ebyhr commented 9 months ago

cc: @wendigo @hashhar