rennX / FunctionPointParser

Repository for Team NLP capstone
1 stars 0 forks source link

distinct word count & tf-idf must remove period from word before processing #43

Closed rennX closed 10 years ago

rennX commented 10 years ago

Right now 'distinct' and 'distinct.' are being processed separately.

In tf_idf_Count(self,tokenized), need to add step to sanitize words as they are put in dict & as they are compared to what is pulled from tokenized list.

rennX commented 10 years ago

Added code to strip periods from block text