mjakubowski84 / parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
https://mjakubowski84.github.io/parquet4s/
MIT License
283 stars 65 forks source link

[Question] Is there a mechanism to detect when the `rotatingWriter` finishes writing to a file and to be notified of the file that was written? #344

Closed calvinlfer closed 6 months ago

calvinlfer commented 9 months ago

I was looking at the postWriteHandler mechanism on viaParquet, I see that there's a flush mechanism but I'm a bit confused on the semantics.

To add some context: I am trying to add Parquet files written by Parquet4S to an Iceberg table and integrate with Apache Iceberg's Java API

mjakubowski84 commented 9 months ago

It seems like postWriteHandler gets called after each chunk of data is written?

That's true. You can use postWriteHandler to implement flushing based on your own business logic.

Is there a way to use this mechanism such that you know when a file has been written and which file it was?

The handler gets a parameter (PartitionState) that informs which partitions (not particular files) have been modified and how many writes were done to those partitions so far (when I look at those counts I think there might be a bug there - I need to check ). There's no handler for file disposal.