Closed mrasu closed 1 year ago
@mrasu I tried to reproduce this with the latest commit (05ed5ed000b1) and the Athena query was successful for me. I see your file was written with version e1109e2f6a20 of parquet-go which is several versions behind. I recommend updating your go.mod to the latest version of parquet-go (05ed5ed000b1), rewriting and uploading the file to S3, recreating the table, and trying the query again.
I also did another experiment adding ,zstd"
to the parquet struct tags, and queried that table fine with Athena too.
Here's the meta output of my file reproducing yours but with the latest build:
$ parquet meta p_segmentio.bin
File path: p_segmentio.bin
Created by: github.com/segmentio/parquet-go version 0.0.0(build 05ed5ed000b1)
Properties: (none)
Schema:
message segmentio {
required binary name (STRING);
required binary id_str (STRING);
}
Row group 0: count: 1 107.00 B records start: 4 total(compressed): 107 B total(uncompressed):107 B
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
name BINARY _ 1 42.00 B "segmentio" / "segmentio"
id_str BINARY _ 1 65.00 B "1064572a7aa5d023a0455b956..." / "1064572a7aa5d023a0455b956..."
@bartleyg After upgrading to the latest version, it works. Thank you!
@mrasu awesome! 🎉
Hello, When I made a parquet file and uploaded to S3 and run in Athena with following query, Athena returns nothing even it exists in reality.
Is there any way to handle it?
Detail
I made a parquet file with below code.
And I made a table in AWS Athena (trino compatibl) and uploaded the made file to S3.
Then, I wanted the
where id_str = '1064572a7aa5d023a0455b956668b179'
SQL matches one record but it didn't.xitongsys/parquet-go works
For investigation, I did the same thing with https://github.com/xitongsys/parquet-go with below code,
After I uploaded it to the same directory and run the same query, I could get one record. So, I think this is not the problem in Athena but in parquet file.
paquet-cli
I compared metadata of the two files with parquet-mr/parquet-cli and there is difference, I hope it helps you see my situation.
For segmentio/parquet-go,
For xitongsys/parquet-go,
Thank you in advance.