Closed NoHomey closed 2 years ago
yes, please. For a long time, I'm focusing on my new job. So sorry for late response.
Hi @xitongsys ,
Also sorry for the late response I've missed the notification email for some reason... I've just opened a PR for the contribution.
Hi
The team I’m part of is working on a project in which we use parquet-go for writing Parquet files which are then consumed by Trino - popular open source SQL query engine. After upgrading the Trino we’ve found out that it can no longer read the Parquet files that we write unless we disable the usage of statistics which degrades the queries performance. We found out that the reason for the exception that we were getting is that newer versions of Trino assume that the
null_counts
field in theColumnIndex
is populated. This is because Trino reads the statistics from theColumnIndex
and not from theColumnMetaData
.We have a small working fix for the writing of
null_counts
to theColumnIndex
in case thenull_count
s from theDataPageHeader(V2).Statistics
have been set and we would like to contribute that code so other people can benefit as well. Please let me know if you want that contribution to be submitted as part of your codebase so I can open a PR.