prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.07k stars 5.38k forks source link

Do presto support vectorized reader on parquet file? #18151

Open Huo662329 opened 2 years ago

Huo662329 commented 2 years ago

Do presto support vectorized reader on parquet file?

ClarenceThreepwood commented 2 years ago

Does "hive.parquet-batch-read-optimization-enabled" implement what you want?

pratyakshsharma commented 1 year ago

@Huo662329 Can you check the above comment and respond?

yingsu00 commented 1 year ago

The Presto Java batch reader should be fairly well vectorized when decoding the data. There may be parts of it that can not be well autovectorized, e.g. Decoding repetition and definition levels. ALso filter push down was not implemented. We're developing a new C++ Parquet reader in Velox, which will be fully vectorized.