monitorjbl / excel-streaming-reader

An easy-to-use implementation of a streaming Excel reader using Apache POI
Apache License 2.0
946 stars 343 forks source link

#181 update currentColNum when column contains an explicit offset #182

Closed PanAeon closed 5 years ago

PanAeon commented 5 years ago

fixes #181

PanAeon commented 5 years ago

@monitorjbl, the file on which I get this error is relatively large (622Kb) and contains confidential info, so I can't provide it. When i try to edit the file on mac os, and then save it the error goes away. Maybe different versions of the office save files differently, but at the moment I'm not sure how to reproduce this in the unit test.

monitorjbl commented 5 years ago

Could you perhaps edit the raw XML of a small workbook to reproduce the conditions of the larger one? Without an example to test in the build, your fix will likely regress in the future.

PanAeon commented 5 years ago

Hi, yeah, I tried to do that before, but got invalid file, probably because I zipped it wrongly. I have added a test case, and verified that it is actually failing without the fix.

PanAeon commented 5 years ago

@monitorjbl I've added the necessary changes. We've got some problems on production because of this issue. As we can't use streaming mode in spark-excel and without streaming mode a 60Mb file can apparently bring down our entire spark cluster (15Gb of memory on executor).

monitorjbl commented 5 years ago

Looks good to me :+1: