nltk / nltk_book

NLTK Book
http://www.nltk.org/book
405 stars 142 forks source link

Chapter 6 example 1.4 #175

Open gordonda opened 8 years ago

gordonda commented 8 years ago

It seems there is an error in the implementation of example 1.4 of chapter 6. The explanation in the text states that the 2000 most frequent words are to be extracted. The code given for this is:

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = list(all_words)[:2000]

but this will return the words to appear first in FreqDist, not necessarily the most frequent. One solution may be to replace the second line of code above with the following line: word_features = [w for w,freq in all_words.most_common(2000)]

SilentFlame commented 8 years ago

@gordonda If you see the result of the above implemented 1.4 example: all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words()) word_features = list(all_words)[:2000] the all_words already contains the frequencies in the sorted order(descending) here as FreqDist buy defalut arrange them in that order.

So please run the code again and close this issue if satisfied.