red6 / pdfcompare

A simple Java library to compare two PDF files
Apache License 2.0
220 stars 66 forks source link

Even if there are minor difference in text space between 2 PDF then its getting flagged as difference #122

Closed rajeshatuce closed 2 years ago

rajeshatuce commented 2 years ago

I have 2 word file .doc which I am converting to PDF using document4j library. Both of these files are identical to human eye and very minor difference in spacing of alphabets, words still both of these files are getting marked as difference.

Is there anyway we can resolve this issue ?

finsterwalder commented 2 years ago

PdfCompare uses pixel by pixel comparison. If two documents differ by just one pixel, they differ. PdfCompare of course does not have an understanding, if those differences matter or not.

But there is a setting to ignore some differences: allowedDifferenceInPercentPerPage=0.2

Percent of pixels that may differ per page. Default is 0. If for some reason your rendering is a little off or you allow for some error margin, you can configure a percentage of pixels that are ignored during comparison. That way a difference is only reported, when more than the given percentage of pixels differ. The percentage is calculated per page. Not that the differences are still marked in the output file, when you addEqualPagesToResult.

But be warned: This setting can also hide one bigger difference. For an A4 page, 1% allowed difference can ignore a square of 3cm by 2cm. Or a large number of pixels distributed on the page.