Closed calvinlfer closed 6 months ago
It seems like postWriteHandler gets called after each chunk of data is written?
That's true. You can use postWriteHandler to implement flushing based on your own business logic.
Is there a way to use this mechanism such that you know when a file has been written and which file it was?
The handler gets a parameter (PartitionState
) that informs which partitions (not particular files) have been modified and how many writes were done to those partitions so far (when I look at those counts I think there might be a bug there - I need to check ).
There's no handler for file disposal.
I was looking at the
postWriteHandler
mechanism onviaParquet
, I see that there's a flush mechanism but I'm a bit confused on the semantics.postWriteHandler
gets called after each chunk of data is written?To add some context: I am trying to add Parquet files written by Parquet4S to an Iceberg table and integrate with Apache Iceberg's Java API