Closed rennX closed 10 years ago
Right now 'distinct' and 'distinct.' are being processed separately.
In tf_idf_Count(self,tokenized), need to add step to sanitize words as they are put in dict & as they are compared to what is pulled from tokenized list.
Added code to strip periods from block text
Right now 'distinct' and 'distinct.' are being processed separately.
In tf_idf_Count(self,tokenized), need to add step to sanitize words as they are put in dict & as they are compared to what is pulled from tokenized list.