Open jnweiger opened 10 years ago
--- /suse/jw/src/github/pdfcompare/pdf_highlight.py 2014-01-07 15:28:01.000000000 +0100 +++ /usr/bin/pdfcompare 2014-01-28 19:54:19.604902143 +0100 @@ -88,6 +88,8 @@
+# 2014-01-28, V1.6.6 jw - --spell now prints out a word list of non-dictionary words seen +# exaclty once. #
@@ -1119,7 +1121,8 @@ each page, giving the exact coordinates of the bounding box of all occurances. Font metrics are used to interpolate into the line fragments found in the dom tree.
matches and spell check findings.
If re_pattern is None, then wordlist is used instead. Keys and values from ext['a'], ext['d'], or ext['c'] respectively are merged into the DecoratedWord output for added, deleted, or changed texts (respectivly). @@ -1218,7 +1221,7 @@
def opcodes_find_moved(iter_list):
stem = m.group(1) word_set.add(stem)
pprint([len(bad_singularity_word_set), 'bad_singularities: ', bad_singularity_word_set])
idx = 0 for word in wl_new:
Words that appear exactly once in document are more likely a typo, the larger the document.
The attached patch attempts to point out such singletons that are not recognized by the spell checker.
Issues with the patch:
[ouch, is there no way to attach a file here?]