Open ChrisMuki opened 2 years ago
Hi Chris!
Parquet4s doesn't expose file schema in its own API (it is a thing that could be added). However, you can easily access it by calling the original Java API that Parquet4s is using under the hood. Check org.apache.parquet.hadoop.ParquetFileReader
, e.g.:
val reader = ParquetFileReader.open(inputFile, readerOptions)
try {
val schema: MessageType = reader.getFileMetaData.getSchema
...
} finally reader.close()
Regarding
a complete list of how to map scala types properly to fields
check the content of TypedSchemaDef
I mean... use this type class implicitly or explicitly to obtain type mapping. Check also a quite rich API of RowParquetRecord
First i want to thank you for this great library!
I need to merge hundreds of small parquet files into bigger ones. Sadly they are not all the same schema (e.g. missing columns), nor is the schema known at compile time.
I am just wondering what would be the most eficient way to get only the schema of a parquet file. Currently i am looking into the first RowParquetRecord but as there might be NullValues....
Further, i am interested if there is a complete list of how to map scala types properly to fields, like this
Types.primitive(INT32, OPTIONAL).as(LogicalTypeAnnotation.dateType()).named(Birthday)
Thanks