Right now, when generating TPC-H data, one huge row group / record batch is created with all of the data. Arrow should be able to handle that "ok" but it doesn't right now and that is perhaps not as realistic a scenario. Perhaps group the data into row groups of size 1M. The writers should have options to control row group / record batch size even if the input to the writer is one huge table.
Right now, when generating TPC-H data, one huge row group / record batch is created with all of the data. Arrow should be able to handle that "ok" but it doesn't right now and that is perhaps not as realistic a scenario. Perhaps group the data into row groups of size 1M. The writers should have options to control row group / record batch size even if the input to the writer is one huge table.