Wrote a little script that kills all lines which differ from the preceding
line only by spacing, capitalization, or punctuation.
It kept the first of each set of duplicate lines rather than trying to guess
which is "best" based on capitalization, punctuation, or anything else.
Maybe someday I'll write some NLTK to clean up capitalization... but
probably not soon.
Wrote a little script that kills all lines which differ from the preceding line only by spacing, capitalization, or punctuation.
It kept the first of each set of duplicate lines rather than trying to guess which is "best" based on capitalization, punctuation, or anything else. Maybe someday I'll write some NLTK to clean up capitalization... but probably not soon.