jnweiger / pdfcompare

compare two PDF files, write a resulting PDF with highlighted changes
GNU General Public License v2.0
54 stars 15 forks source link

spell checker could point out bad words that appear exactly once. #6

Open jnweiger opened 10 years ago

jnweiger commented 10 years ago

Words that appear exactly once in document are more likely a typo, the larger the document.

The attached patch attempts to point out such singletons that are not recognized by the spell checker.

Issues with the patch:

[ouch, is there no way to attach a file here?]

jnweiger commented 10 years ago

--- /suse/jw/src/github/pdfcompare/pdf_highlight.py 2014-01-07 15:28:01.000000000 +0100 +++ /usr/bin/pdfcompare 2014-01-28 19:54:19.604902143 +0100 @@ -88,6 +88,8 @@

later on. Strange.

2014-01-07, V1.6.5 jw - manually merged https://github.com/jnweiger/pdfcompare/pull/4

hope, I did not break too much...

+# 2014-01-28, V1.6.6 jw - --spell now prints out a word list of non-dictionary words seen +# exaclty once. #

osc in devel:languages:python python-pypdf >= 1.13+20130112

need fix from https://bugs.launchpad.net/pypdf/+bug/242756

@@ -1119,7 +1121,8 @@ each page, giving the exact coordinates of the bounding box of all occurances. Font metrics are used to interpolate into the line fragments found in the dom tree.