shuttle-hq / synth

The Declarative Data Generator
https://www.getsynth.com/
Apache License 2.0
1.36k stars 105 forks source link

Richer export types (parquet, avro) #412

Open djoanes opened 1 year ago

djoanes commented 1 year ago

Required Functionality A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] With the current functionality of synth, the output artifacts are all typeless. This causes there to be a necessity for auto type derivation downstream or an explicit redeclaration of the types of all the columns.

Proposed Solution Add support for new uris, parquet:, avro:. This will allow for richer importing and exporting where the explicitly defined types are preserved.

Use case I'd like to use synth to generate data and bulk load it into a big data ecosystem (ie. Hadoop, BigQuery)

iamwacko commented 1 year ago

Avro integration is possible, avrow should work well. Parquet integration would take a lot more work, as it doesn't seem to have a decent serde implantation.