microsoft / AzureStorageExplorer

Easily manage the contents of your storage account with Azure Storage Explorer. Upload, download, and manage blobs, files, queues, tables, and Cosmos DB entities. Gain easy access to manage your virtual machine disks. Work with either Azure Resource Manager or classic storage accounts, plus manage and configure cross-origin resource sharing (CORS) rules.
Creative Commons Attribution 4.0 International
365 stars 85 forks source link

Incorrect preview of parquet files with decimals #7957

Open ihenry opened 1 month ago

ihenry commented 1 month ago

Preflight Checklist

Storage Explorer Version

1.33.1

Regression From

No response

Architecture

x64

Storage Explorer Build Number

20240410.2

Platform

All

OS Version

Windows 11 & MacOS 14.5

Bug Description

Incorrect preview of parquet files with multiple decimal precision (5,3), (9,5) and (38,6).

Steps to Reproduce

Previewing a parquet file in Azure Storage Explorer containing columns defined with various decimal precision (5,3), (9,5) and Decimal (38,6) shows incorrect results. The file should be previewed as in DBeaver with 0 or 0.xxx as appropriate. DBeaver with DuckDB shows the following preview

Screenshot 2024-05-24 at 12 58 58

DBeaver with DuckDB Metadata

Screenshot 2024-05-24 at 12 52 55

Azure Storage Explorer 1.33.1

Screenshot 2024-05-24 at 12 52 32

Actual Experience

Expecting to see raw values, but we actually see {"type":"Buffer","data":[0,0,0,0]}

craxal commented 1 month ago

@ihenry Can you share your Parquet file or a small sample file that we could test with?

ihenry commented 1 month ago

Thanks @craxal. I havew emailed a sample parquet file to the sehelp mailbox.

craxal commented 2 weeks ago

Issue reproduced on our end.

Are all of your decimal values intentionally 0? Every buffer that's parsed seems to contain only zeroes.

It seems that the library we use does not currently support decimal values (see https://github.com/LibertyDSNP/parquetjs#list-of-supported-types--encodings). We might be able to work around this by parsing the buffer ourselves.

ihenry commented 2 weeks ago

Yes, that extract was from a system that contains sample data. It appears the default value is zero. I have seen the same behaviour with non-zero decimal values too, that was real data which is more difficult to share.