gsi-upm / absa

Aspect Based Sentiment Analysis for Culinary domain
Apache License 2.0
4 stars 1 forks source link

about words.json #1

Open codenamker opened 6 years ago

codenamker commented 6 years ago

Hi, I think this work is good! May I ask how did you get the words.json? Thank you!

militarpancho commented 6 years ago

Many thanks!!. It's simple, Maybe I should add the code for it. I took the Semeval Dataset for ABSA 2015 and count the most common words for each category defined (Food, Drinks, Service...). This adds and improvement in the topic classification task

codenamker commented 6 years ago

Thank you for your reply! It was very helpful for me to understand ABSA.

militarpancho commented 6 years ago

If you are interested in the process, it was using freqDist from nltk library The code was something like this:

frec = nltk.FreqDist(nltk.word_tokenize(review))
print("Most frequent")
print(frec.most_common(10))
print("Least frequent")
print(list(frec.keys())[-10:])
codenamker commented 6 years ago

Thank you so much! I try to do now.