Open nevillelyh opened 3 years ago
Turns out the new 3 level list is more complex.
With the default 2 level list, myField: List[T]
is written as:
required group myField (LIST) {
repeated T array;
}
But the Avro counter part is still "name": "myField", "type": "array", "items": T
While with 3 level list, the Parquet schema becomes:
required group myField (LIST) {
repeated group list {
required T element;
}
}
And the Avro record becomes [{"element": t1}, {"element": t1}]
...
WIP in https://github.com/spotify/magnolify/tree/neville/pq-avro
More on Avro array
mapping. The following Avro fields
{"name": "field1", "type:" {"type": "array", "items": "string"}, "default": [] } // required array field that defaults to empty array
{"name": "field2", "type:" ["null", {"type": "array", "items": "string"}], "default": null } // nullable array field that defaults to null
map to:
required group field1 (LIST) {
repeated binary array (STRING);
}
optional group field2 (LIST) {
repeated binary array (STRING);
}
AvroWriteSupport
- oldTwoLevelListWriter
vs newThreeLevelListWriter
parquet.avro.data.supplier
with generic records in test #278ReadSupport
https://github.com/spotify/magnolify/commit/2aea4e8de0aaae4db7517e4dd61dbf2df19e20f0