Open yuzhangyi opened 4 years ago
Same thing happened to me too. Had about 200k rows and StreamingReader wasn't able to load all the data. Instead, it just loaded the amount of rows which was declared in rowCacheSize method. My workaround was to just set rowCacheSize parameter higher than the number of rows the excel file had.
Hi, @monitorjbl , when I try to use StreamingReader to read data from one worksheet, it only return the data in cached rows (for example, 100 rows of data, but my worksheet has 100,000,000 rows, do I miss any configuration? could you help give some guidance? thanks!
override def openWorkbook(): Workbook = { fileExtension match { case "xlsx" => { StreamingReader.builder() .rowCacheSize(100) .bufferSize(4096)
.open(inputStreamProvider) } case "xls" => { WorkbookFactory.create(inputStreamProvider) } case _ => failure("EXCEL-EXE-09", s"Invalid file is provided with format $fileExtension while only .xlsx and .xls are supported.").get } }
The other case is that, it throw exception directly: 20/07/21 13:22:43 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) com.monitorjbl.xlsx.exceptions.ParseException: Error reading XML stream at com.monitorjbl.xlsx.impl.StreamingSheetReader.getRow(StreamingSheetReader.java:76) at com.monitorjbl.xlsx.impl.StreamingSheetReader.access$200(StreamingSheetReader.java:32) at com.monitorjbl.xlsx.impl.StreamingSheetReader$StreamingRowIterator.hasNext(StreamingSheetReader.java:402) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:220) at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:1002) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)