segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Repeated group array #467

Closed yonesko closed 1 year ago

yonesko commented 1 year ago

Hello! I write parquet files with slice of struct in schema. This slice is represented in repeated group list + element, and this representation is not readable by python lib, as opposed to repeated group array:

 optional group field1 (LIST) {
    repeated group array {
      optional binary field2 (STRING);
      optional int64 field3;
   }
}

optional group field1 (LIST) {
    repeated group list {
      required group element {
        optional binary field2 (STRING);
        optional int64 field3;
      }
   }
}

Is it possible to use repeated group array instead ?

Java lib can read both kinds of schemas: the old one list.array and the new one list.array.element. It looks like python lib can read only old one. By default java writes old one too.

achille-roussel commented 1 year ago

Hello @yonesko,

I believe the issue you reported might be related to #468, do you mind looking at the answer I left there and let me know if it applies to your problem as well?

yonesko commented 1 year ago

Thanks! Maybe you're right. I am trying to add plain encoding to all fields, I have a very big structure and keep getting "cannot add encoding to a non-leaf node" error How can I detect only leaf nodes and set plain for those fields?

yonesko commented 1 year ago

Thanks!