Open hoggatt opened 10 months ago
Hi @hoggatt. Thank you for your feedback, and for your kind words.
Sorry for the inconvenience, I never looked at that aspect of the hadoop-based writer. I have now, thanks to you, but it's a bit different than what is described in the link you provided (we are actually using a different writer class).
I have found the right solution now, and I'm thinking of not generating these crc files by default and provide the option to generate them if needed. I'll include this in the next release. In the meantime, you can disable this hadoop feature on your side by adding this bit of code prior to writing parquet files (in a try/catch block):
org.apache.hadoop.fs.FileSystem fs = org.apache.hadoop.fs.FileSystem.getLocal(new org.apache.hadoop.conf.Configuration());
fs.setWriteChecksum(false);
Thanks again for giving me the opportunity to dive into this topic.
Thank you! This is extremely helpful! I appreciate you digging into it.
First off, I love this library, thank you for making it so easy to read and write parquet files in Java!
One minor annoyance is that saving parquet files generates hidden .crc files. It appears that there is a way to turn this off in the underlying library (example here). Would it be possible to expose that option (
withValidation
) to theTablesawParquetWriteOptions
builder?