trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.3k stars 2.97k forks source link

Hive partition parsing error #19420

Open wangmiao1002 opened 12 months ago

wangmiao1002 commented 12 months ago

Hi, community

I have a hive external table with a four layer partition logtype、mod and date are all partition fields Running the SQL below was unable to parse the partition fields. A full partition scan was performed, which delayed the query time. The table has a large number of partitions, so the query is slow, and the slow logs of the hive metadata were also captured, resulting in obtaining all the SQL statements for the problems. How can I avoid this issue environmental information trino :426 hive:2.1.1-cdh6.3.2

image

image

image

wangmiao1002 commented 12 months ago

Previously used presto queries, the cluster was using ec to compress data, so it was replaced with trino; Presto can parse partition information during queries

electrum commented 11 months ago

Can you share the output of SHOW CREATE TABLE applog_online, at least for the partition columns?

wangmiao1002 commented 11 months ago

Can you share the output of SHOW CREATE TABLE applog_online, at least for the partition columns?

Okay, here is the table information. The partition field date is also set with a date type, and I have also added the cache parameters in trino. I found that it is still slow. Could you please give me some advice on how to avoid it? Thank you

CREATE EXTERNAL TABLE `applog_online`(
      `message` string)
    PARTITIONED BY ( 
      `date` date, 
      `mod` string, 
      `logtype` string, 
      `host` string)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'field.delim'='\t', 
      'serialization.format'='\t') 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      'hdfs://nameservice1/user/hive/warehouse/applog_online'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1678851016')