Open yorsita opened 2 years ago
Parquet files are different from ordinary text files. They cannot append data to the end of the parquet file. You can use the following methods to solve the problem:
You can try to read out data from multiple small parquet files and regenerate a large parquet file.
You can write data into a text file. When the size of the text file meets your needs, convert the contents of the text file into a parquet file.
The Parquet format does not directly support appending row groups, but fastparquet
seems to manage it by patching/edititing the end of the file before appending another row group. See https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write for details. I don't know the Parquet format well enough to know whether this is a nasty hack or a perfectly reasonable tactic.
Hi team, I am writing data into parquet file in several times. In certain cases I want to append the data to an existing parquet file. I saw someone had asked the simialr issue 4 years ago and I was wondering is currently a way to do that? or alternatively can I read the parquet file in buffer then append data in the end and flush it to the same file?
Thanks!