mukunku / ParquetViewer

Simple Windows desktop application for viewing & querying Apache Parquet files
GNU General Public License v3.0
754 stars 91 forks source link

Alternating rows are displaying with incorrect values #20

Closed neil-hucker-seequent closed 3 years ago

neil-hucker-seequent commented 3 years ago

Using Release 2.1 Binary download.

Problem: I have a parquet file with 30 rows (attached as a file with .parquet extension then zipped Rows_5036221_5036251.zip ). When I load it in to ParquetViewer, every 2nd row is incorrect. The first column is titled i (columns i,j,k are coordinates). Every 2nd row of the i column displays a 0 instead of the value it's meant to have (j,k are meant to all have the same value). The value it's meant to have in i, is then pushed to the next row. So for example: I have column of 156, 157, 158, 159, 160

in parquet viewer i get: 156, 0, 157, 0, 158,

For the rest of the columns , some of them have correct values for their row position and some don't e.g. row 20 and 21 (count beginning at 1) contain "LMS1" for column V1. This is correct. but in the same rows S6 contains 0 when it should be "5".

In this file the S columns are a status for a null value replacement string. If the S column is 0, then the V column of the same number should contain a value, but if the S column contains a non 0 value, the V column should be blank.

I've compared the results with the similar tool "BiddataFileViewer" at https://github.com/Eugene-Mark/bigdata-file-viewer and also with my dev's while debugging the parquet file creation (they are using pyarrow to import a csv into a parquet file). Screen shots of results attached.

*Note that BigFileDataViewer has other issues with this file. It errors on the first attempt to load (complaining about incorrect magic numbers), but then loads correctly on a subsequent attempt.

ParquetViewer image

BigFileDataViewer image

mukunku commented 3 years ago

Thanks for the details and sample file. Updating the Parquet.NET library used in this project fixed the issue (#21)

Please try the latest release!

neil-hucker-seequent commented 3 years ago

Thanks Sal. It seems to work correctly on that file now. I tried it on a 140million row file as well and it seems to process that one correctly as well.