jsantalo / happybirds

GNU General Public License v3.0
1 stars 0 forks source link

word2vec #25

Open ldescampsvila opened 6 years ago

ldescampsvila commented 6 years ago

investigate text classificacion with word2vec in order to contrast a bag of words to a word2vec method to see what works best:

http://nadbordrozd.github.io/blog/2016/05/20/text-classification-with-word2vec/

maytepenella commented 6 years ago

We should all look into this.

ldescampsvila commented 6 years ago

I cannot see any accuracy improvement...

ldescampsvila commented 6 years ago

at the moment, the best accuracy (0.62) is achieved via a Pipeline of TDIDF and SVC:

svc_tfidf = Pipeline([("tfidf_vectorizer", TfidfVectorizer(analyzer=lambda x: x)), ("linear svc", SVC(kernel="linear"))]) svc_tfidf.fit(train["tokenized_sents"], y_train) y_pred3 = svc_tfidf.predict(test["tokenized_sents"]) score3 = accuracy_score(y_test, y_pred3)

and without pre-transform...