Deduplicate ALL THE THINGS

rawsonj / triviabot

A simple IRC trivia bot written in python using twisted.

GNU General Public License v3.0

41 stars 52 forks source link

Deduplicate ALL THE THINGS #35

Closed edunham closed 10 years ago

edunham commented 10 years ago

Wrote a little script that kills all lines which differ from the preceding line only by spacing, capitalization, or punctuation.

It kept the first of each set of duplicate lines rather than trying to guess which is "best" based on capitalization, punctuation, or anything else. Maybe someday I'll write some NLTK to clean up capitalization... but probably not soon.

edunham commented 10 years ago

actually I really ought to rewrite this to keep the one of the dups with more caps, more spaces, and more punctuation.