trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.45k stars 3.01k forks source link

Error when reading JSON table with TIMESTAMP: java.lang.ClassCastException: class java.sql.Timestamp cannot be cast to class java.lang.Number #5602

Closed jaspreet997 closed 4 years ago

jaspreet997 commented 4 years ago

Hi, We are getting the below error when timestamps are null in hive json tables

java.lang.ClassCastException: class java.sql.Timestamp cannot be cast to class java.lang.Number (java.sql.Timestamp is in module java.sql of loader 'platform'; java.lang.Number is in module java.base of loader 'bootstrap')
    at io.prestosql.plugin.hive.GenericHiveRecordCursor.getLongExpressedValue(GenericHiveRecordCursor.java:309)
    at io.prestosql.plugin.hive.GenericHiveRecordCursor.parseLongColumn(GenericHiveRecordCursor.java:293)
    at io.prestosql.plugin.hive.GenericHiveRecordCursor.parseColumn(GenericHiveRecordCursor.java:551)
    at io.prestosql.plugin.hive.GenericHiveRecordCursor.isNull(GenericHiveRecordCursor.java:508)
    at io.prestosql.plugin.hive.HiveRecordCursor.isNull(HiveRecordCursor.java:237)
    at io.prestosql.$gen.CursorProcessor_20201020_052612_13.filter(Unknown Source)
    at io.prestosql.$gen.CursorProcessor_20201020_052612_13.process(Unknown Source)
    at io.prestosql.operator.ScanFilterAndProjectOperator$RecordCursorToPages.process(ScanFilterAndProjectOperator.java:323)
    at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
    at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
    at io.prestosql.operator.WorkProcessorUtils$YieldingProcess.process(WorkProcessorUtils.java:181)
    at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
    at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
    at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
    at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
    at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277)
    at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
    at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
    at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
    at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
    at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
    at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
    at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
    at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
    at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
    at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
    at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:149)
    at io.prestosql.operator.Driver.processInternal(Driver.java:379)
    at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
    at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
    at io.prestosql.operator.Driver.processFor(Driver.java:276)
    at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
    at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
    at io.prestosql.$gen.Presto_344____20201020_051608_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

Steps to REPRODUCE

1.

CREATE EXTERNAL TABLE database.eod_ad_json_test( eod_time timestamp COMMENT 'from deserializer', ) PARTITIONED BY ( ad string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://bucket/location'

2. unzip and Upload/store the attached eod.json file to the location eod.zip

then alter table eod_json_test add partition ( ad='2020-10-20' ) location 's3://bucket/location'

3. Try running the below query on presto 341-344 and the above error will be reproduced

select eod_time from database.eod_json_test where ad = '2020-10-20' and eod_time is not null

findepi commented 4 years ago

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

just noticed that.

Presto does not support this serde yet (https://github.com/prestosql/presto/issues/5638), so i assume you modified Presto / added more jars to the classpath. @jaspreet997 Is the bug reproducible without any customizations in Presto?

jaspreet997 commented 4 years ago

@findepi Yes I have added org.openx.data.jsonserde.JsonSerDe jar to the plugins in hive-hadoop2 path.

I tried removing the Jar and remvoed the serde name from Glue for that table. Still getting deserializer does not exist: org.openx.data.jsonserde.JsonSerDe error when trying to execute the same query. Any alternate serde I can use here to try and reproduce the issue?

Also the issue does not occur in version 340 with the same jar in the classpath, if that helps with debugging

findepi commented 4 years ago

Also the issue does not occur in version 340 with the same jar in the classpath

Thanks for the info. We currently do not bundle the openx serde, we do not test it, and so we cannot prevent breakage from occurring. Apologies for that. Proper support for this serde is tracked in https://github.com/prestosql/presto/issues/5638

Any alternate serde I can use here to try and reproduce the issue?

is the problem reproducible eg for a table defined in Presto with .. WITH(format='JSON') (org.apache.hive.hcatalog.data.JsonSerDe serde)?

jaspreet997 commented 4 years ago

is the problem reproducible eg for a table defined in Presto with .. WITH(format='JSON') (org.apache.hive.hcatalog.data.JsonSerDe serde)?

Working fine with this serde. We do use some features from the openx serde though so we will have to work out something until the support comes officially. Appreciate the help so far.

findepi commented 4 years ago

@jaspreet997 thanks for the feedback i will close the issue as not-a-bug-in-Presto for now.

can you please mention in https://github.com/prestosql/presto/issues/5638 those additional features that openx serde has that you're benefitting from?