tedunderwood / DataMunging

Scripts that clean up OCR and munge Hathi metadata.
74 stars 16 forks source link

Must generate word counts #1

Open izahn opened 9 years ago

izahn commented 9 years ago

On the step

I can just write clean text files (with suffix clean.txt)
or I can also write tab-separated files that count the words
in each file after correction.
1) Text only or 2) text-plus-wordcounts? (1 or 2): 1) Text only or 2) text-plus-wordcounts? (1 or 2): 1

the OCRnormalizer.py script only works if I select 2. If I select 1 I get

Traceback (most recent call last):
  File "OCRnormalizer.py", line 351, in <module>
    main()
  File "OCRnormalizer.py", line 313, in main
    metatuple = (outfilename, str(totalwordsinvol), str(pre_matched), str(pre_english), str(post_matched),
UnboundLocalError: local variable 'totalwordsinvol' referenced before assignment
tedunderwood commented 9 years ago

Thanks for the notification.

Ted

Ted Underwood Professor of English Liberal Arts and Sciences Centennial Scholar University of Illinois, Urbana-Champaign

On Tue, Feb 24, 2015 at 10:44 AM, Ista Zahn notifications@github.com wrote:

On the step

I can just write clean text files (with suffix clean.txt) or I can also write tab-separated files that count the words in each file after correction. 1) Text only or 2) text-plus-wordcounts? (1 or 2): 1) Text only or 2) text-plus-wordcounts? (1 or 2): 1

the OCRnormalizer.py script only works if I select 2. If I select 1 I get

Traceback (most recent call last): File "OCRnormalizer.py", line 351, in main() File "OCRnormalizer.py", line 313, in main metatuple = (outfilename, str(totalwordsinvol), str(pre_matched), str(pre_english), str(post_matched), UnboundLocalError: local variable 'totalwordsinvol' referenced before assignment

— Reply to this email directly or view it on GitHub https://github.com/tedunderwood/DataMunging/issues/1.