na-ka-na / ExcelCompare

Command line tool (and API) for diffing Excel Workbooks
820 stars 102 forks source link

Cannot process very large workbooks #41

Closed mattpalermo closed 7 years ago

mattpalermo commented 7 years ago

I have a workbook which is approximately 6.5 MB. Unfortunately, the JRE runs out of memory every time I run ExcelCompare on two similar versions of this workbook (for example only one cell added). I tried increasing the JRE heap size to 1500m (I couldn't go much higher) without success. The Excel Compare program works great for smaller workbooks.

Would it be possible to accommodate larger workbooks? Is there some settings that I could use to reduce memory usage and perhaps only parse one part of the workbook at a time?

I know this is a fairly extreme case so I am not too concerned if a solution is not feasible.

na-ka-na commented 7 years ago

I can test it out. Can you attach a large workbook, I can see if something works.

mattpalermo commented 7 years ago

Thanks na-ka-na. I can't upload the workbook that I used when I encountered this problem but I will try to create one that demonstrates the same problem.

mattpalermo commented 7 years ago

Ok I have done some experimentation. It seems it only takes an army of 10 pixel kitties to take down ExcelCompare. pixel-kitty-army.xlsx

Is there some sort of cross sheet comparison happening? Could the individual sheets be processed one by one so it can stand up to an infinite army of pixel kitties?

na-ka-na commented 7 years ago

So I did some experiments: your pixel-kitty-army of 2.3MB grew up to pixel-cat-army of 8.1MB. Ran ExcelCompare with both arguments as pixel-cat-army.xlsx. It took 2.7GB (on mac osx) to run. Basically it loads both excels entirely in memory and then does the diff. When I opened pixel-cat-army.xlsx in excel on mac osx it took 1.2GB. So ExcelCompare's 2.7GB is not entirely off bounds.

I can look into loading the workbook piecemeal but it doesn't seem easy to do. For now I'd recommend simply giving more memory, or, just remove the -Xmx parameter altogether if you're on 64 bit jvm.

mattpalermo commented 7 years ago

Your suggestion about the 64-bit JVM seems to work well. I used -d64 and -Xmx5g which gave Java more than enough memory to do larger workbooks such as 7MB. Sorry, I didn't know much about the JVM memory management.

I now have a different problem so I'll open a seperate issue. Thank you for your support.

na-ka-na commented 7 years ago

Np. I'm planning to just remove the -Xmx flag for next release for people less well versed with jvm flags.