Must generate word counts

tedunderwood / DataMunging

Scripts that clean up OCR and munge Hathi metadata.

74 stars 16 forks source link

I can just write clean text files (with suffix clean.txt) or I can also write tab-separated files that count the words in each file after correction. 1) Text only or 2) text-plus-wordcounts? (1 or 2): 1) Text only or 2) text-plus-wordcounts? (1 or 2): 1

Traceback (most recent call last): File "OCRnormalizer.py", line 351, in <module> main() File "OCRnormalizer.py", line 313, in main metatuple = (outfilename, str(totalwordsinvol), str(pre_matched), str(pre_english), str(post_matched), UnboundLocalError: local variable 'totalwordsinvol' referenced before assignment

Thanks for the notification.

Ted

Ted Underwood Professor of English Liberal Arts and Sciences Centennial Scholar University of Illinois, Urbana-Champaign

On Tue, Feb 24, 2015 at 10:44 AM, Ista Zahn notifications@github.com wrote:

On the step

I can just write clean text files (with suffix clean.txt) or I can also write tab-separated files that count the words in each file after correction. 1) Text only or 2) text-plus-wordcounts? (1 or 2): 1) Text only or 2) text-plus-wordcounts? (1 or 2): 1

the OCRnormalizer.py script only works if I select 2. If I select 1 I get

Traceback (most recent call last): File "OCRnormalizer.py", line 351, in main() File "OCRnormalizer.py", line 313, in main metatuple = (outfilename, str(totalwordsinvol), str(pre_matched), str(pre_english), str(post_matched), UnboundLocalError: local variable 'totalwordsinvol' referenced before assignment

— Reply to this email directly or view it on GitHub https://github.com/tedunderwood/DataMunging/issues/1.

tedunderwood / DataMunging

Must generate word counts #1