Open alexanderpanchenko opened 6 years ago
Main non-ensemble supevised models are ready for word2vec-type baseline embeddings
Trained on Farahmand dataset, tested on Reddy, Reddy++ and Farahmand (by 5-fold cv). Results can be seen in the doc.
Also added more supervised approaches from Farahmand et al. article and unsupervised from Lioma et al. to results section
Evaluations done for three 750-d vectors as features; farahmand dataset was rescaled to Reddy scale, negative correlation problem eradicated
Thanks! Can you please also generate
LR concat 750x2 SVR concat 750x2 KR concat 750x2 SGD concat 750x2 KNN concat 750x2 PLS concat 750x2 Tree concat 750x2
where one of the 750 dimensional embedding is a sum of individual words and the other is the compound embeddings?
Example:
hot+dog = 750 dims
hot_dot = 750 dims
On 20 Aug 2018, at 11:15, Dmitri notifications@github.com wrote:
Main non-ensemble supevised models are ready for word2vec-type baseline embeddings
Linear Regression Support Vector Regression Kernel Regression SGD Regression K Nearwest Neighb Regression PLS Regression Decision Tree For SVR and LR, different feature approaches were used (cosine distance, euclidian distance and raw vector difference). The latter didn't really work out. Trained on Farahmand dataset, tested on Reddy, Reddy++ and Farahmand (by 5-fold cv). Results can be seen in the doc.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uhh-lt/melting_pot/issues/2#issuecomment-414251368, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6voiMVpSu23XxQItmPjmq4QLn0Coqks5uSn46gaJpZM4Vwh0a.
Random Forest model evaluation is ready and can be seen in the table
Predictions for cross-sense Sensegram cosines are ready (n_features=max amount of cosines=72) in the table. Worse than baseline.
Related work search on the supervised prediction: Look at all papers that cite Reddy/Redyy++/Fahramand. http://www.aclweb.org/anthology/I11-1024 http://www.aclweb.org/anthology/P16-2026 http://www.aclweb.org/anthology/W15-0904 (write a script that searches the word 'supervised' in the downloaded pdf files).
Read http://www.aclweb.org/anthology/P16-1187.
Use the datasets Reddy, Reddy++, Farahmand to train supervised classifiers (sklearn all + neural classifiers using keras) to get a new baseline.