thisandagain / sentiment

AFINN-based sentiment analysis for Node.js.
MIT License
2.64k stars 311 forks source link

Fix for replacing punctuation in tokenize.js #149

Closed jalners closed 5 years ago

jalners commented 6 years ago

It will be better if you will replace punctuation in tokenizer with space. Because you can have sentence like next: "If you are Razr owner...you must have this!" In previous sentence your tokenizer will return next wrong array: ['if', 'you', 'are', 'razr', 'owneryou', 'must', 'have', 'this'] The error in - 'owneryou'

elyas-bhy commented 6 years ago

LGTM. @thisandagain ?

thisandagain commented 5 years ago

Excellent! The validation accuracy improvements with this are certainly worthwhile:

Before

Amazon accuracy: 0.7202797202797203 IMDB accuracy: 0.7642357642357642 Yelp accuracy: 0.6943056943056943

After

Amazon accuracy: 0.7252747252747253 IMDB accuracy: 0.7652347652347652 Yelp accuracy: 0.6963036963036963