kbajalc / parquets

MIT License
44 stars 21 forks source link

Avoid race conditions with concurrent appendRow() calls #27

Open dobesv opened 3 years ago

dobesv commented 3 years ago

This fixes it so that if appendRow() is called on ParquetWriter while a row group is being written out, it won't cause problems.

See https://github.com/ironSource/parquetjs/pull/105 for some discussion of the issue as reported in parquetjs.

They settled on adding a mutex, which I don't think is necessary - the row group is queued up as one write by the looks of it, so it's not a problem to queue up another write being that one, the underlying I/O system should handle that correctly.