Closed felipepessoto closed 3 years ago
Thanks for the contribution @felipepessoto , I've merged it to v2.3. Sorry it took so long but it's hard to make time.
Thanks @mukunku. Will you cherry pick it to main?
There's one issue with your proposed changes, it doesn't calculate the total number of records in the file correctly. While working on it I realized the Thrift metadata already has the record count in it so I made some additional changes to utilize that.
It's possible to cherry pick both commits to master but I want to make sure my change won't also break something, hence the beta release. After a few weeks I'll merge all of it to master.
Also, if you're working with large files, give the new multi-threaded engine a try. I'm interested if it's stable or not because I saw a significant performance increase for large files (hundreds of columns and millions of rows).
Parquet Viewer Version 2.2
Where was the parquet file created? C#
Sample File Test.zip
Describe the bug In UtilityMethods class this line contains a bug after the first call:
For example, if I have two row groups with 2 lines each. At first call the if will be if (rowIndex=0 >= readRecords=2) - OK But the next calls will be if (rowIndex=2 >= readRecords=2) and it will break. Unless the second row group is bigger than the first, but it is buggy anyway, since it will skip rows.
After fixing this issue, I also found another problem, where the row count is not respected after the first row group:
Screenshots
Additional context Add any other context about the problem here.
Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.