Open varunbpatil opened 4 years ago
An easy solution I can think of is to write a custom function which ignores invalid files and aggregates the results from all valid files, definitely expensive though.
Best, Ravi
On Mon, Jun 15, 2020, 11:25 AM Varun B Patil notifications@github.com wrote:
I'm using Presto 0.236 with Hive connector.
presto:default> SELECT Time FROM mytable WHERE date_ >= 20200613 ORDER BY Time;
Query 20200615_151944_00029_cyskg, FAILED, 1 node Splits: 118 total, 0 done (0.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20200615_151944_00029_cyskg failed: hdfs://x.x.x.x:8020/abc/test.parquet is not a valid Parquet File
The error is correct in that the above parquet file - test.parquet - is invalid.
But, I want Presto to be able to skip such files in the query. Is this possible?
I tried an older version 0.181 and it does skip invalid parquet files, but I want some features in the newer Presto version and was wondering if there is a flag for this.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/prestodb/presto/issues/14652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3TLDB7TH4E3SCKHQHOJITRWY4NNANCNFSM4N6JDGYQ .
I'm using Presto 0.236 with Hive connector.
The error is correct in that the above parquet file - test.parquet - is invalid.
But, I want Presto to be able to skip such files in the query. Is this possible?
I tried an older version 0.181 and it does skip invalid parquet files, but I want some features in the newer Presto version and was wondering if there is a flag for this.