segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Trouble creating a modified schema #507

Open tschaub opened 1 year ago

tschaub commented 1 year ago

I'm trying to use this module to read an existing Parquet file and write out a modified Parquet file. For example, I would like to transform a single column from one type to another. Or in other cases I would like to modify the compression used when writing.

I've had some luck creating a writer using the schema from an input file:

writerConfig, _ := parquet.NewWriterConfig(input.Schema())
writer := parquet.NewGenericWriter[any](output, writerConfig)

And then I later write a modified version of the rows (e.g. after transforming some values to a different type):

writer.WriteRows(modifiedRows)

This works except that the output schema is wrong for the columns where I have transformed values.

I recognize that this may not be how this module is intended to be used. If it does sound like a reasonable thing to do, can anyone provide advice on how I might clone an existing schema and make modifications to one or more of the field types?

kevinburkesegment commented 1 year ago

Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this ticket there if it is still relevant.