mjakubowski84 / parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
https://mjakubowski84.github.io/parquet4s/
MIT License
283 stars 65 forks source link

Add API to write single file using custom ParquetWriter #291

Closed flipp5b closed 1 year ago

flipp5b commented 1 year ago

@mjakubowski84, this draft adds an experimental API to enable writing a single file using custom ParquetWriter. WDYT, is this an acceptable approach? This draft touches parquet4s-fs2 only.

The usage could look as follows:

import com.github.mjakubowski84.parquet4s.parquet.writeSingleFile

val xs: Stream[IO, X] = ???
val builder = XParquetWriter.builder(...).withConf(...)

xs.through(writeSingleFile[IO].pipe(builder)).compile.drain
marcinaylien commented 1 year ago

Hi @flipp5b. Thanks for your contributions! I am travelling this week. I will have a look at your PR over the weekend.

flipp5b commented 1 year ago

@mjakubowski84, could you please have a look again? I implemented the same approach for parquet4s-akka. It seemed to me that there's no point adding the custom writer support to parquet4s-core as com.github.mjakubowski84.parquet4s.ParquetWriterImpl is a pretty thin wrapper around com.github.mjakubowski84.parquet4s.ParquetWriter#InternalWriter.

Yet, I didn't write the documentation as I don't know where to better place it (fs2/akka-specific pages or separate page).

mjakubowski84 commented 1 year ago

It looks amazing! Please just add a note in fs2/akka-specific pages, and then we can do a release :)

flipp5b commented 1 year ago

Done! :)