How does it perform with document images like invoices, receipts

idealo / imagededup

😎 Finding duplicate images made easy!

Apache License 2.0

5.16k stars 456 forks source link

Thanks for this awesome (and seamless) module.

I want to know if it performs equally "good" in document images. Ongoing through the code (and the blog), it seems the DHasher rescales every to image to (9, 8) and then code is generated using a direct horizontal gradient. This seems to do better for documents consists of abstract entities like person, car without much care for details (since even after rescaling to such narrow size our eyes can still differentiate the resulting image).

However, for document images, even some differences in table structure or lines can make it belong to different categories which after rescaling might be lost.

So, has evaluation been done on document images?

idealo / imagededup

How does it perform with document images like invoices, receipts #85