mjakubowski84 / parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
https://mjakubowski84.github.io/parquet4s/
MIT License
283 stars 65 forks source link

[question] is there a clever way to register writers for lots of case classes #299

Closed normana400 closed 1 year ago

normana400 commented 1 year ago

It seems like the writerOf functionality requires a hard typing for every writer to the case class it writes.

Let's say I have 50+ different case classes I want to have parquet writers for. It seems heavy to manually have to code each case class with a parquet writer and then update that logic with every new case class that gets developed later

example

case class Alpha
case class Beta
case class Gamma
case class Delta
... every case class in the alphabet 
case class Omega

def writerOf[T<: Product](data: T): ParquetWriter[T]={
   data match{
     case cc: Alpha => ParquetWriter.of[Alpha]
     case cc: Beta => ParquetWriter.of[Beta]
     case cc: Gamma => ParquetWriter.of[Gamma]
     case cc: Delta => ParquetWriter.of[Delta]
... every case class A-O
     case cc: Omega  => ParquetWriter.of[Omega]
     case _: throw new RuntimeException("sorry no writer for you!") 
    }
}

is there way to obtain a parquet writer for a case class without the manual stitching of the above code to register each concrete case class that needs to have a writer?

mjakubowski84 commented 1 year ago

Each time you write a Parquet file you need to provide a schema for this file. Now, it really depends on what you want to do. If you want to write each case class to a separate normalized file/directory then, sorry, you have to provide a dedicated schema for each case class. If that is not the case, and you want to dump all data into a single file/directory then you can have a single generic schema.

If maintaining a huge class hierarchy is a problem for you then you can have a look into generic records.

Please read the documentation: https://mjakubowski84.github.io/parquet4s/docs/records_and_schema/