xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.27k stars 293 forks source link

add length to INTERVAL schema #470

Closed hangxie closed 2 years ago

hangxie commented 2 years ago

tools' schema output lacks of length for INTERVAL type, if one uses the output to process data, INTERVAL type will be zero length.

Current output:

$ go run ./tool/parquet-tools/ -cmd schema -file example/type.parquet --schema-format go -tag| grep -i interval
  Interval string `parquet:"name=Interval, type=INTERVAL, repetitiontype=REQUIRED"`
$ go run ./tool/parquet-tools/ -cmd schema -file example/type.parquet | grep -i interval
      "Tag": "name=Interval, type=FIXED_LEN_BYTE_ARRAY, convertedtype=INTERVAL, repetitiontype=REQUIRED"

This PR adds length=12 to the schema, also fix go struct output:

$ go run ./tool/parquet-tools/ -cmd schema -file example/type.parquet --schema-format go -tag| grep -i interval
  Interval string `parquet:"name=Interval, type=FIXED_LEN_BYTE_ARRAY, convertedtype=INTERVAL, length=12, repetitiontype=REQUIRED"`
$ go run ./tool/parquet-tools/ -cmd schema -file example/type.parquet | grep -i interval
      "Tag": "name=Interval, type=FIXED_LEN_BYTE_ARRAY, convertedtype=INTERVAL, length=12, repetitiontype=REQUIRED"
hangxie commented 2 years ago

I'm going to close this as the go struct generated by tools/parquet-tools is far away from usable.