I am using the rotatingWriter with generic types and passing the ParquetWriter.Options to overwrite the parquet files in S3 (with the hadoop-aws connector).
I have provided the correct IAM roles etc (PutObject, DeleteObject) to the IAM role being used, yet I see that the old files do not get removed but instead the new file is appended to the parquet directory. (i.e. the default ParquetFileWriter.Mode.CREATE is getting picked up from here).
Are there any gotchas while using the "OVERWRITE" mode? Or am I doing something incorrect?
My FS2 pipe for writing the parquet file looks like this :
I am using the
rotatingWriter
withgeneric
types and passing theParquetWriter.Options
to overwrite the parquet files in S3 (with the hadoop-aws connector).I have provided the correct IAM roles etc (
PutObject
,DeleteObject
) to the IAM role being used, yet I see that the old files do not get removed but instead the new file is appended to the parquet directory. (i.e. the defaultParquetFileWriter.Mode.CREATE
is getting picked up from here).Are there any gotchas while using the "OVERWRITE" mode? Or am I doing something incorrect?
My FS2 pipe for writing the parquet file looks like this :
where
messageSchema
denotes the MessageType andhadoopFilePath
is the Path in S3.