Closed dvesic closed 10 months ago
It looks like the parquet-dotnet
library we use doesn't support your file.
The quickest way to get it resolved would be to open a ticket in their repo: https://github.com/aloneguid/parquet-dotnet/issues
I'll see if I can take a look and fix it on my end but can't promise anything on timing.
@dvesic So I figured out why your file can't be opened. It appears to me to be malformed; there's an extra byte of data in the data page for some reason that's throwing off the parquet-dotnet library.
I created a release with a patch that you can download here: ParquetViewer_PR81_v0.zip
I created this release from this fork I made of the parquet-dotnet library. However I don't think this solution is correct, assuming the file really is malformed. So I won't be adding a fix for this in any of the main releases.
You can use the patched ParquetViewer I shared above for your own files and hopefully this bug will get fixed in future versions of Oracle or parquet-dotnet.
Also, thanks for sharing your bug and a test file.
cc: @aloneguid
I'm going to close out this issue for now. But please feel free to re-open if you want to discuss further.
Thank you very much for the patch - I appreciate it.
Hey @dvesic ,
I came across this bug: https://github.com/dask/fastparquet/issues/855
I noticed the file you shared had this in its metadata:
"CreatedBy": "fastparquet-python version 2023.4.0 (build 0)",
They seem to have fixed an issue with string byte array sizes which is very similar to the behavior I was observing when reviewing your file.
I wonder if your Oracle can be updated to use version 2023.8.0
instead of 2023.4.0
. If you get the chance to test it with the newer version please let me know if the issue is fixed in the latest regular release.
Parquet Viewer Version: 2.7.1.0
Where was the parquet file created? python, using pandas and fastparquet library
Sample File Sample file attached.
Example.zip
Describe the bug Try to open file; if you select only first column, if will open fine. If you select all, second will cause problem and no data will be displayed.
Screenshots Attached screenshot.
Additional context Original column definition from Oracle database: