at line no.89 , you have written
self.prob_spam[word] = (self.tf_spam[word]) * log((self.spam_mails + self.ham_mails) \ / (self.idf_spam[word] + self.idf_ham.get(word, 0)))
According to this page TF-IDF defination, IDF is inverse document frequency , i.e
IDF(t) = log_e(Total number of documents / Number of documents with term t in it).
whereas you are computing, log(total no. of mails/no. of spam words + no. og ham words)
why?
at line no.89 , you have written
self.prob_spam[word] = (self.tf_spam[word]) * log((self.spam_mails + self.ham_mails) \ / (self.idf_spam[word] + self.idf_ham.get(word, 0)))
According to this page TF-IDF defination, IDF is inverse document frequency , i.e
IDF(t) = log_e(Total number of documents / Number of documents with term t in it).
whereas you are computing, log(total no. of mails/no. of spam words + no. og ham words) why?