Closed weallwegot closed 6 years ago
from nltk.corpus import stopwords
s=set(stopwords.words('english'))
txt="a long string of text about him and her"
print filter(lambda w: not w in s,txt.split())
tried this. lmao, it took 25 minutes to get the count of all of the words in the reviews... will have to look into this again. i should have sorted the output list... grr
Add extra stop words Add cleaning data function In cleaning data function Make list into set Then cut it out of set if the first character is punctuation Then cut it out of set if last character is punctuation
Write the word counts out to a file. The file can just be stored in the local directory since this takes a couple minutes to run usually. Although this will reduce running time by a little. If it's under 3 total seconds then we will add it
maybe use the words file at a next iteration, its in the repo
From @weAllWeGot on December 6, 2016 19:19
this can then be the basis of the categories. this probably a new branch but an interesting one. i wouldnt get rid of the predefined categories but i would add to them with what the analysis comes up with kind of more smartly/automatically.
_Copied from original issue: weAllWeGot/kbai_chatbot3#20