Open zhixingheyi-tian opened 2 years ago
By debugging,have figured out the cause was from Arrow:file_orc.cc
Result<RecordBatchIterator> Execute() override {
...
Result<std::shared_ptr<RecordBatch>> Next() {
if (i_ == num_stripes_) {
return nullptr;
}
std::shared_ptr<RecordBatch> batch;
// TODO (https://issues.apache.org/jira/browse/ARROW-14153)
// pass scan_options_->batch_size
return reader_->ReadStripe(i_++, included_fields_);
}
...
}
Now ORC in Arrow dataset has not yet honored the ScanOptions batch_size option.
So the returned recordbatch size maybe > 65535
cc @zhouyuan @zhztheplayer
Describe the bug When run TPCDS integration testing. Encounter below out of bound issue