ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
346 stars 175 forks source link

Cannot write more than once #82

Open balajiaruna opened 5 years ago

balajiaruna commented 5 years ago

I have specified the option of append mode, but fruits.parquet has only the first 2 row (apples & Oranges). What am I missing?

Thanks!

var opts = {flags: 'a'};

var writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet', opts);

// append a few rows to the file await writer.appendRow({name: 'apples', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); await writer.appendRow({name: 'oranges', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); write.close();

writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet', opts);

// append a few rows to the file await writer.appendRow({name: 'banana', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); await writer.appendRow({name: 'peaches', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); write.close();

ZJONSSON commented 5 years ago

Appending to a parquet file is a little more complicated than specifying append flag on the file, as the file has a metadata and footer at the end of the file.

One way to do a pure append, is first read the metadata and then append manually to the file, and finalize by writing the updated metadata and the footer at the end. The old metadata would be essentially orphaned off.

alienintheheights commented 5 years ago

FWIW: I just grouped all the rows I needed for a particular parquet file into a custom data structure. Once built, I looped through that structure and appended to the parquet file within a single open/close block. Solved the problem of having to worry about appending via the parquetJS api.