hangxie / parquet-tools

Utility to deal with Parquet data
BSD 3-Clause "New" or "Revised" License
70 stars 10 forks source link

schema command should support map value with composite type for go struct #431

Open hangxie opened 4 hours ago

hangxie commented 4 hours ago

Right now it does not:

$ parquet-tools schema -f go testdata/map-composite-value.parquet
parquet-tools: error: go struct does not support composite type as map value in field [Parquet_go_root.Scores]

However, this is not true as this can be resolved by defining another struct, eg https://github.com/xitongsys/parquet-go/blob/master/example/local_nested.go#L12-L24

hangxie commented 4 hours ago

https://github.com/hangxie/parquet-tools/issues/206 was the original issue that decided not to support nested type, link here as a reference.

hangxie commented 4 hours ago

Current idea is that we create separated type definition for any map or object, the name can be anything (like Struct123) then use those types in parent type, there should be an option to consolidate structs with same definition to be single one, and by default it should be on (ie no duplicated struct definitions).

Need DFS for this ...

hangxie commented 3 hours ago

A couple of more considerations:

  1. list should allow composite type for value
  2. map should NOT allow composite type for key
    • I never saw a parquet file with this type of usage but still need a spec support this decision