Closed sequencerr closed 8 months ago
Also it's showing only first element of lists
Nested complex types are not supported unfortunately. If you could share a sample file it could help get it implemented.
Hello, @mukunku
Sample files (We don’t support that file type. Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.)
sample-parquet-files.zip
generated using: you might be also interested in https://github.com/LibertyDSNP/parquetjs/blob/c07e7e81847523f4d74edd0adf9b2f9b6bbd1d90/lib/reader.ts#L104
import { ParquetSchema, ParquetWriter } from '@dsnp/parquetjs';
const SchemaList = new ParquetSchema({
groceries: { type: 'UTF8', repeated: true }
});
const SchemaUser = new ParquetSchema({
user: {
fields: {
rating: {
fields: {
value: { type: 'FLOAT' },
count: { type: 'INT64' }
}
}
}
}
});
const writer2 = await ParquetWriter.openFile(SchemaList, 'list.parquet');
const writer1 = await ParquetWriter.openFile(SchemaUser, 'user.parquet');
await writer2.appendRow({ groceries: ['foo', 'bar', 'baz', 'no', 'naming', 'imagination'] });
await writer1.appendRow({
user: {
rating: {
value: 4.3,
count: 34
}
}
});
await writer1.close();
await writer2.close();
Some web closed-source readers. might help works well. data display is bad. - https://parquetreader.com/home top search engine result - not quite accurate - https://www.parquet-viewer.com (same as https://apps.microsoft.com/detail/9N33Z6DPLR49)
Ehm, well there is https://github.com/aloneguid/parquet-dotnet/tree/master/src/Parquet.Floor which is works as intended for nested (utf8 for non-latin has bad display)
Thanks this is all helpful. I'll take a look when I get the chance. I'll leave this issue open in case anyone else wants to give implementing this a shot as well.
Also can't view the file that is created when running the parquet.net example for dictionaries. Likely related.
From https://aloneguid.github.io/parquet-dotnet/serialisation.html#nested-types
class IdWithTags {
public int Id { get; set; }
public Dictionary<string, string>? Tags { get; set; }
}
var data = Enumerable.Range(0, 10).Select(i => new IdWithTags {
Id = i,
Tags = new Dictionary<string, string> {
["id"] = i.ToString(),
["gen"] = DateTime.UtcNow.ToString()
}}).ToList();
await ParquetSerializer.SerializeAsync(data, "c:\\tmp\\map.parquet");
The exception thrown is Field schema path not found: key_value/key
Thanks again for the sample files and code folks. I went ahead and created a pre-release of v2.10.1 with fixes for your issues.
@sequencerr I added nested struct support so this new version can open your test user.parquet
file that you shared. The utility will still have issues opening nested lists or maps but at least nested struct support is there now.
@dbraaten42 I broadened the Map type support so ParquetViewer supports Map's created with Parquet.Net now 😁 Thanks a lot for reporting the issue.
Please give this new version a try, folks. I'm going to close this ticket out but feel free to open a new one if you have more parquet files you can't view.
Is there a problem on my side? https://github.com/mukunku/ParquetViewer/issues/3