xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.27k stars 293 forks source link

schema.NewSchemaHandlerFromJSON fails with runtime error #379

Closed FrankStienhans2 closed 3 years ago

FrankStienhans2 commented 3 years ago

JSON Schema: {"Tag":"name=aws_ebs_volume, repetitiontype=REQUIRED","Fields":[{"Tag":"name=status, type=BYTE_ARRAY","Fields":null},{"Tag":"name=numid, type=INT64","Fields":null},{"Tag":"convertedtype=TIMESTAMP_MILLIS, name=created, type=INT64","Fields":null},{"Tag":"name=device, type=BYTE_ARRAY","Fields":null},{"Tag":"name=kms_key, type=BYTE_ARRAY","Fields":null},{"Tag":"type=INT64, name=modification_start","Fields":null},{"Tag":"name=modification_status, type=BYTE_ARRAY","Fields":null},{"Tag":"name=region, type=BYTE_ARRAY","Fields":null},{"Tag":"name=size, type=INT64","Fields":null},{"Tag":"name=az, type=BYTE_ARRAY","Fields":null},{"Tag":"valuetype=BYTE_ARRAY, valueconvertedtype=UTF8, name=tags, type=MAP, convertedtype=MAP, keytype=BYTE_ARRAY, keyconvertedtype=UTF8","Fields":null},{"Tag":"name=id, type=BYTE_ARRAY","Fields":null},{"Tag":"name=snapshot, type=BYTE_ARRAY","Fields":null},{"Tag":"type=BYTE_ARRAY, name=account","Fields":null},{"Tag":"name=credits, type=DOUBLE","Fields":null},{"Tag":"convertedtype=TIMESTAMP_MILLIS, name=deleted, type=INT64","Fields":null},{"Tag":"name=encrypted, type=BOOLEAN","Fields":null},{"Tag":"name=id_domain, type=BYTE_ARRAY","Fields":null},{"Tag":"name=machine, type=BYTE_ARRAY","Fields":null},{"Tag":"name=meta_id, type=BYTE_ARRAY","Fields":null},{"Tag":"name=type, type=BYTE_ARRAY","Fields":null},{"Tag":"name=cloud, type=BYTE_ARRAY","Fields":null},{"Tag":"name=user, type=BYTE_ARRAY","Fields":null},{"Tag":"type=INT64, convertedtype=TIMESTAMP_MILLIS, name=updated","Fields":null}]}

fails with runtime error: index out of range [1] with length 0

Because of the defer recover function (why?), I do not know where it fails.

Can you please help?

hangxie commented 3 years ago

This does not seem to be good:

{"Tag":"valuetype=BYTE_ARRAY, valueconvertedtype=UTF8, name=tags, type=MAP, convertedtype=MAP, keytype=BYTE_ARRAY, keyconvertedtype=UTF8","Fields":null}

It should be something like https://github.com/xitongsys/parquet-go/blob/master/example/json_schema.go#L47-L54

FrankStienhans2 commented 3 years ago

I copied it from the homepage of this project: Map map[string]int32 parquet:"name=map, type=MAP, convertedtype=MAP, keytype=BYTE_ARRAY, keyconvertedtype=UTF8, valuetype=INT32"

instead I want map[string]string

But I understand your feedback - I should declare sub fields. Thank you