monitorjbl / excel-streaming-reader

An easy-to-use implementation of a streaming Excel reader using Apache POI
Apache License 2.0
953 stars 344 forks source link

Can't iterate over files that contains over 1 milion rows #117

Open Stromner opened 7 years ago

Stromner commented 7 years ago

Ran into this issue for a file I'm working on that is close to 1,5 million rows and covers two sheets.

This is the basic iterator I'm running: for(Row row:wb.getSheetAt(0)){

}

When it get to row 1 million I get the following error: at java.lang.NumberFormatException.forInputString(Unknown Source) at java.lang.Integer.parseInt(Unknown Source) at java.lang.Integer.parseInt(Unknown Source) at com.monitorjbl.xlsx.impl.StreamingSheetReader.handleEvent(StreamingSheetReader.java:104) at com.monitorjbl.xlsx.impl.StreamingSheetReader.getRow(StreamingSheetReader.java:76) at com.monitorjbl.xlsx.impl.StreamingSheetReader.access$100(StreamingSheetReader.java:37) at com.monitorjbl.xlsx.impl.StreamingSheetReader$StreamingRowIterator.hasNext(StreamingSheetReader.java:370)

monitorjbl commented 6 years ago

I strongly suspect this is not because of the number of rows but because of the data in the row. It may still be a bug, could you provide the XML content of the row?

Stromner commented 6 years ago

I can not, it's data related to work. But it got the same format as the earlier 1M rows. Have you tried reading any row past 1M in a document? It should trigger the error.