mukunku / ParquetViewer

Simple Windows desktop application for viewing & querying Apache Parquet files
GNU General Public License v3.0
687 stars 82 forks source link

Unsupported Nested Structs #100

Closed sequencerr closed 3 months ago

sequencerr commented 4 months ago

image Is there a problem on my side? https://github.com/mukunku/ParquetViewer/issues/3

sequencerr commented 4 months ago

Also it's showing only first element of lists

mukunku commented 4 months ago

Nested complex types are not supported unfortunately. If you could share a sample file it could help get it implemented.

sequencerr commented 4 months ago

Hello, @mukunku Sample files (We don’t support that file type. Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.) sample-parquet-files.zip

generated using: you might be also interested in https://github.com/LibertyDSNP/parquetjs/blob/c07e7e81847523f4d74edd0adf9b2f9b6bbd1d90/lib/reader.ts#L104

import { ParquetSchema, ParquetWriter } from '@dsnp/parquetjs';

const SchemaList = new ParquetSchema({
    groceries: { type: 'UTF8', repeated: true }
});
const SchemaUser = new ParquetSchema({
    user: {
        fields: {
            rating: {
                fields: {
                    value: { type: 'FLOAT' },
                    count: { type: 'INT64' }
                }
            }
        }
    }
});

const writer2 = await ParquetWriter.openFile(SchemaList, 'list.parquet');
const writer1 = await ParquetWriter.openFile(SchemaUser, 'user.parquet');

await writer2.appendRow({ groceries: ['foo', 'bar', 'baz', 'no', 'naming', 'imagination'] });
await writer1.appendRow({
    user: {
        rating: {
            value: 4.3,
            count: 34
        }
    }
});

await writer1.close();
await writer2.close();

Some web closed-source readers. might help works well. data display is bad. - https://parquetreader.com/home top search engine result - not quite accurate - https://www.parquet-viewer.com (same as https://apps.microsoft.com/detail/9N33Z6DPLR49)

sequencerr commented 4 months ago

Ehm, well there is https://github.com/aloneguid/parquet-dotnet/tree/master/src/Parquet.Floor which is works as intended for nested (utf8 for non-latin has bad display)

mukunku commented 4 months ago

Thanks this is all helpful. I'll take a look when I get the chance. I'll leave this issue open in case anyone else wants to give implementing this a shot as well.

dbraaten42 commented 4 months ago

Also can't view the file that is created when running the parquet.net example for dictionaries. Likely related.
From https://aloneguid.github.io/parquet-dotnet/serialisation.html#nested-types

class IdWithTags {
    public int Id { get; set; }

    public Dictionary<string, string>? Tags { get; set; }
}

var data = Enumerable.Range(0, 10).Select(i => new IdWithTags {
    Id = i,
    Tags = new Dictionary<string, string> {
        ["id"] = i.ToString(),
        ["gen"] = DateTime.UtcNow.ToString()
    }}).ToList();

await ParquetSerializer.SerializeAsync(data, "c:\\tmp\\map.parquet");

The exception thrown is Field schema path not found: key_value/key

mukunku commented 3 months ago

Thanks again for the sample files and code folks. I went ahead and created a pre-release of v2.10.1 with fixes for your issues.

@sequencerr I added nested struct support so this new version can open your test user.parquet file that you shared. The utility will still have issues opening nested lists or maps but at least nested struct support is there now.

@dbraaten42 I broadened the Map type support so ParquetViewer supports Map's created with Parquet.Net now 😁 Thanks a lot for reporting the issue.

Please give this new version a try, folks. I'm going to close this ticket out but feel free to open a new one if you have more parquet files you can't view.