udacity / aitnd-issues

Repo for AITND issues/bug reports from students.
64 stars 0 forks source link

Term 2: Lesson 6 part 6: computing _tf function does not need to check if freq == 0 #91

Open jevgenitolstouhhov opened 5 years ago

jevgenitolstouhhov commented 5 years ago

In section 4 of the workbook I noticed that a formula requires to check frequency for zero, if a word does not appear in a document.

This condition actually never evaluates to true, because in function get_tf we take frequency from bag_of_words function and in the bag we always have a frequency of each word at least 1.

What gives desired result and 0 for non existing words in a document is actually get_vector function. When we iterate IDF words from entire corpus, we eventually get words non existent in TF dictionary. So we get 0 just be cause we try to select a word that does not exist, but not because we have a condition in function _tf. Also I need to mention that constructing defaultdict with "int" factory is important (in get_tf function). Otherwise the dictionary will not return 0 on non-existent keys, but will throw an exception.

I propose to fix solution code and also fix the description in workbook's section 4 a little bit, which concerns handling special case, when freq == 0.

Thanks!