mukunku / ParquetViewer

Simple Windows desktop application for viewing & querying Apache Parquet files
GNU General Public License v3.0
783 stars 98 forks source link

[FEAT] Support Array of Struct Type #108

Closed ChristianDu closed 6 months ago

ChristianDu commented 6 months ago

Describe the feature you'd like to be added to Parquet Viewer Hi, can you please add the support of Arrays? I can't open array columns: image

Example File:

arrayExample.zip

Definition of columns (Apache Spark Java):

return createStructType(Arrays.asList( createStructField("Product", StringType, true), createStructField("Orders", createArrayType(createStructType(Arrays.asList( createStructField("DateTime", TimestampType, true), createStructField("Quantity", DoubleType, true) )), false), true) ));

Thank you!

Share why this feature would be a good addition to the utility Improves usability, currently files with arrays can't be loaded correctly

Screenshots Any screenshots describing how the feature would look is a plus.

Note: There are no guarantees your feature will be implemented.

mukunku commented 6 months ago

Hey @ChristianDu, as you saw I've added support for struct arrays in v3.0.0 . Thanks for opening the issue and sharing a sample file.

Going to close out this issue but feel free to re-open if the issue persists.

ChristianDu commented 6 months ago

Hi @mukunku ,

thank you very much for implementing the feature that fast. It works with a few files but i get an error for one column.

createStructField(COLUMN_NAME, createArrayType(createStructType(Arrays.asList( createStructField(SUB_COL_1, TimestampType, true), createStructField(SUB_COL_2, DoubleType, true), createStructField(SUB_COL_3, DoubleType, true) )), false), true);

I will try to provide you with an example file because i can't upload the exact file.

Error:


Specified cast is not valid.

Something went wrong (CTRL+C to copy):

System.InvalidCastException: Specified cast is not valid.

at ParquetViewer.Engine.ParquetEngine.ReadListField(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int32 rowBeginIndex, ParquetSchemaElement itemField, Int32 fieldIndex, Int64 skipRecords, Int64 readRecords, Boolean isFirstColumn, CancellationToken cancellationToken, IProgress`1 progress)

at ParquetViewer.Engine.ParquetEngine.ProcessRowGroup(DataTableLite dataTable, ParquetRowGroupReader groupReader, Int64 skipRecords, Int64 readRecords, CancellationToken cancellationToken, IProgress`1 progress)

at ParquetViewer.Engine.ParquetEngine.PopulateDataTable(DataTableLite dataTable, ParquetReader parquetReader, Int64 offset, Int64 recordCount, CancellationToken cancellationToken, IProgress`1 progress)

at ParquetViewer.Engine.ParquetEngine.ReadRowsAsync(List1 selectedFields, Int32 offset, Int32 recordCount, CancellationToken cancellationToken, IProgress1 progress)

at ParquetViewer.MainForm.<>c__DisplayClass33_0.<b__1>d.MoveNext()

--- End of stack trace from previous location ---

at ParquetViewer.MainForm.LoadFileToGridview()

at System.Threading.Tasks.Task.<>c.b__128_0(Object state)

at InvokeStub_SendOrPostCallback.Invoke(Object, Span`1)

at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

OK

ChristianDu commented 6 months ago

@mukunku I could reproduce the error. It happens when every array of the column is empty. I created an example file (same columns as the previous one but this one has empty arrays):

emptyArrayError.zip

mukunku commented 6 months ago

Thanks. I will take a look 👍🏼

mukunku commented 6 months ago

@ChristianDu Can you try this alpha version with your file?

ParquetViewer_#108.zip

Not sure if you use regular exe or self-contained but I can't upload self-contained in a comment so I shared the regular one.

If you're not comfortable testing the exe that's okay too! I can add it to the release as usual.

ChristianDu commented 6 months ago

Looks good! File is loading and showing the data correctly. Thanks for the fast fix.

ChristianDu commented 6 months ago

Implemented and fixed. Thx

mukunku commented 6 months ago

Thanks for confirming! https://github.com/mukunku/ParquetViewer/releases/tag/v3.0.0.1 has the fix now 💪🏼