red6 / pdfcompare

A simple Java library to compare two PDF files
Apache License 2.0
220 stars 66 forks source link

Diffrences not showing in logs and results file when pdf has above 200 pages #128

Closed lukaszzajaczkowski closed 1 year ago

lukaszzajaczkowski commented 1 year ago

Hello,

I need to compare pdfs that has more than 200 pages. When I try to compare them, there is message that differences were found but no pages and coordinates are shown in logs. Also - in report file differences are not marked.

It works perfectly for pdfs with 200 pages.

Logs for pdfs with 250 pages: image

Logs for pdfs with 200 pages: image

Is there any way to compare bigger pdfs and get correct results?

finsterwalder commented 1 year ago

There should be no issue with comparing bigger PDFs, and I think people used it for bigger PDFs before. There are two possible problems that I can think of: Memory or threading PdfCompare needs quite a bit of memory to work. Even more so, when working in multithreading mode. You could try to provide more memory to the JVM and/or use single threading by setting parallelProcessing=false. I had a report before, that the multithreading mode produced errors, so maybe there is a subtile bug somewhere.

lukaszzajaczkowski commented 1 year ago

Thank you for response.

I provided more memory for JVM (4gb) and changed configuration by using SimpleEnvironment.

Custom configuration is ignored for pdf with 250 pages, but it works ok for pdf with 200 pages.

I provided more memory in IntelliJ settings (Settings -> Build, Execution, Deployment -> Compiler -> Shared build process heap size), in Help -> Change Memory Settings and in test NG runner configuration. - Should it be ok?

finsterwalder commented 1 year ago

I provided more memory for JVM (4gb) and changed configuration by using SimpleEnvironment.

Custom configuration is ignored for pdf with 250 pages, but it works ok for pdf with 200 pages.

This sentence does not make sense. Either the configuration works or it does not. That has nothing to do with the amount of pages of the PDF. Please carefully read the README and make sure you apply the setting properly. You can test whether you are setting the config properly by changing the Color of your diff. That's easier to verify, because you can't easily see whether processing is happening in parallel or not.

It may of course be, that the compare of the 200 pages PDF works and the 250 pages PDF does not. Either because of "too many pages" or because the 250 pages PDF contains something that causes trouble, because PdfBox can't process it properly.

I provided more memory in IntelliJ settings (Settings -> Build, Execution, Deployment -> Compiler -> Shared build process heap size), in Help -> Change Memory Settings and in test NG runner configuration. - Should it be ok?

I can't help you there. You need to figure out yourself how you provide more memory to the JVM in your setting and are sure it actually works. "Shared build process heap size" does not sound like runtime heap size to me, but I have no idea. If in doubt monitor the JVM with something like VisualVM (https://visualvm.github.io/). There you can see whether there is enough heap or there is an issue.

lukaszzajaczkowski commented 1 year ago

Looks like the problem was with pdf file - Thank you for hint!

Everything works great now. It compares pdf's with 3000+ pages with no problems.

finsterwalder commented 1 year ago

Good to hear that. Thanks for the update.