rhiever / reddit-analysis

A Python script that parses post titles, self-texts, and comments on reddit and makes word clouds out of the word frequencies.
285 stars 63 forks source link

Combine singular and plural of words into a single count (y and ies) #38

Closed rhiever closed 11 years ago

rhiever commented 11 years ago

In the case of "furry" and "furries," for example. Can you think of any cases where they shouldn't be combined?

bboe commented 11 years ago

zombies -> zombie

Just run egrep 'ies$' /usr/share/dict/words to see a list of words that end in ies.

I don't think this "automated" approach is good, not for s removal either for instance hiss is not plural for which his is a word. ass is another example.

rhiever commented 11 years ago

What if we use the same method as we did for s and 's? I think that's a pretty safe way of doing things.

bboe commented 11 years ago

What if we use the same method as we did for s and 's? I think that's a pretty safe way of doing things.

Yeah, I suppose you're right on that. If the shortened form exists in the existing set, then it'll likely work for English words.