Open Xuchaojing opened 4 years ago
Could you try hive.orc.use-column-names=true
?
Also, I would suggest you join the community Slack https://prestosql.io/slack.html
hive.orc.use-column-names it doesn't solve this issue when i add this property to catalog/hive.properties, restart the presto cluster. then run the sql, the error still occurs
(For hive.orc.use-column-names
, please track https://github.com/prestosql/presto/issues/1558)
@Xuchaojing can you provide the full error stacktrace with and without hive.orc.use-column-names=true
, as well as table schema (SHOW CREATE TABLE xxx
)
in order to protect the security of the information, I changed the name of the table. change the real name to "abc". the following info is table schema :
CREATE TABLE `ods.abc`(
`id` bigint,
`identifier` string,
`identifier_md5` string,
`channel` bigint,
`os_version` bigint,
`request_url` string)
COMMENT 'abc'
PARTITIONED BY (
`pt` string COMMENT 'full_partition')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim'='\u0001',
'line.delim'='\n',
'serialization.format'='\u0001')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://testHadoop/dataware/ods/abc'
TBLPROPERTIES (
'transient_lastDdlTime'='1574322027')
the full error stacktrace when run sql using presto: Query 20200509_161449_04577_9pmv8 failed: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200425/part-0f22af46_648b_45dd_896b_6cd088339140 (offset=134217728, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200425/part-0f22af46_648b_45dd_896b_6cd088339140]
@Xuchaojing You can get the full stactrace by --deubg
CLI option or Web UI. (The pasted message isn't full stacktrace)
@Xuchaojing You can get the full stactrace by
--deubg
CLI option or Web UI. (The pasted message isn't full stacktrace)
this is the full stacktrace, and the version of the prestosql i used is 317.
Query 20200510_052523_00189_9pmv8 failed: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89 (offset=134217728, length=33554432: Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
io.prestosql.spi.PrestoException: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89 (offset=134217728, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:235)
at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createPageSource(OrcPageSourceFactory.java:121)
at io.prestosql.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:164)
at io.prestosql.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:98)
at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:45)
at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:283)
at io.prestosql.operator.Driver.processInternal(Driver.java:379)
at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
at io.prestosql.operator.Driver.processFor(Driver.java:276)
at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.prestosql.$gen.Presto_317_dirty____20200428_034006_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.prestosql.orc.OrcCorruptionException: Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
at io.prestosql.orc.reader.ReaderUtils.verifyStreamType(ReaderUtils.java:40)
at io.prestosql.orc.reader.SliceStreamReader.<init>(SliceStreamReader.java:63)
at io.prestosql.orc.reader.StreamReaders.createStreamReader(StreamReaders.java:48)
at io.prestosql.orc.OrcRecordReader.createStreamReaders(OrcRecordReader.java:559)
at io.prestosql.orc.OrcRecordReader.<init>(OrcRecordReader.java:240)
at io.prestosql.orc.OrcReader.createRecordReader(OrcReader.java:249)
at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:205)
@Xuchaojing Could you share two kinds of full stacktrace with:
hive.orc.use-column-names=true
hive.orc.use-column-names=false
hive.orc.use-column-names=true
the error stack is the same, whether true or false.
Let's continue in this Slack thread https://prestosql.slack.com/archives/CGB0QHWSW/p1589011535253100
I think this is because the ORC reader that Hive uses converts the data type from File Type
to Reader Type
when possible, but Presto does not support the conversion currently.
Query 20200509_032316_00682_9pmv8 failed: Error opening Hive split hdfs://xxxxxxx (offset=33554432, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING
so is it any suggestion about this issue?