Closed jalners closed 5 years ago
LGTM. @thisandagain ?
Excellent! The validation accuracy improvements with this are certainly worthwhile:
Amazon accuracy: 0.7202797202797203 IMDB accuracy: 0.7642357642357642 Yelp accuracy: 0.6943056943056943
Amazon accuracy: 0.7252747252747253 IMDB accuracy: 0.7652347652347652 Yelp accuracy: 0.6963036963036963
It will be better if you will replace punctuation in tokenizer with space. Because you can have sentence like next: "If you are Razr owner...you must have this!" In previous sentence your tokenizer will return next wrong array:
['if', 'you', 'are', 'razr', 'owneryou', 'must', 'have', 'this']
The error in - 'owneryou'