Open arhimondr opened 2 years ago
I chatted with @findepi about it and based on quick investigation it looks like no-go.
For Delta
we cannot change default compression to ZSTD as it is not supported by databricks right now. (todo: find exact failures).
For Hive
There were bunch of experiments already. I think most recently there was a try by @electrum in this PR. I skimmed through error logs and it looks that even on HDP3 (Hive 3.1) ORC with ZSTD is not supported well. There are errors like these:
2022-05-12T23:00:18.6841558Z tests | Caused by: java.lang.IllegalArgumentException: Unknown compression
2022-05-12T23:00:18.6842101Z tests | at org.apache.orc.impl.ReaderImpl.extractPostScript(ReaderImpl.java:448)
2022-05-12T23:00:18.6842587Z tests | at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:564)
2022-05-12T23:00:18.6843009Z tests | at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:370)
2022-05-12T23:00:18.6843446Z tests | at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:61)
2022-05-12T23:00:18.6843936Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:105)
2022-05-12T23:00:18.6844533Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1647)
2022-05-12T23:00:18.6845178Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1533)
2022-05-12T23:00:18.6845753Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1329)
2022-05-12T23:00:18.6846314Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1513)
2022-05-12T23:00:18.6846850Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1510)
2022-05-12T23:00:18.6847336Z tests | at java.security.AccessController.doPrivileged(Native Method)
2022-05-12T23:00:18.6847748Z tests | at javax.security.auth.Subject.doAs(Subject.java:422)
2022-05-12T23:00:18.6848228Z tests | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
2022-05-12T23:00:18.6848777Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1510)
2022-05-12T23:00:18.6849335Z tests | at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1329)
2022-05-12T23:00:18.6849816Z tests | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2022-05-12T23:00:18.6850125Z tests | ... 3 more
For iceberg
We are already there
cc: @arhimondr
Requires https://github.com/trinodb/trino/issues/9775