tejank10 / Spam-or-Ham

30 stars 46 forks source link

Invalid IDF formula #5

Open pk97 opened 6 years ago

pk97 commented 6 years ago

at line no.89 , you have written self.prob_spam[word] = (self.tf_spam[word]) * log((self.spam_mails + self.ham_mails) \ / (self.idf_spam[word] + self.idf_ham.get(word, 0)))

According to this page TF-IDF defination, IDF is inverse document frequency , i.e

IDF(t) = log_e(Total number of documents / Number of documents with term t in it).

whereas you are computing, log(total no. of mails/no. of spam words + no. og ham words) why?