Open Liwink opened 7 years ago
for word in brown.words(): word = word.lower() suffix_fdist[word[-1:]] += 1 suffix_fdist[word[-2:]] += 1 suffix_fdist[word[-3:]] += 1
If the word only has two letter, for instance of, the suffix of will be counted twice.
of
So we can see of has higher frequency than f.
f
>>> 'of'[-3:] 'of' >>> common_suffixes[:10] ['e', ',', '.', 's', 'd', 't', 'he', 'n', 'a', 'of']
I solve it by:
>>> for word in brown.words(): ...: word = word.lower() ...: for i in range(min(len(word), 3)): ...: suffix_fdist[word[-i-1:]] += 1 >>> suffix_fdist['f'] 43101 >>> suffix_fdist['of'] 36566
If the word only has two letter, for instance
of
, the suffixof
will be counted twice.So we can see
of
has higher frequency thanf
.I solve it by: