Chapter 6 example 1.4 - Githubissues

nltk / nltk_book

NLTK Book

405 stars 142 forks source link

It seems there is an error in the implementation of example 1.4 of chapter 6. The explanation in the text states that the 2000 most frequent words are to be extracted. The code given for this is:

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = list(all_words)[:2000]

but this will return the words to appear first in FreqDist, not necessarily the most frequent. One solution may be to replace the second line of code above with the following line: word_features = [w for w,freq in all_words.most_common(2000)]

nltk / nltk_book

Chapter 6 example 1.4 #175