yoonkim / CNN_sentence

CNNs for sentence classification
2.05k stars 826 forks source link

question regarding datasets #31

Open pianoman4873 opened 7 years ago

pianoman4873 commented 7 years ago

Hello, This is not an issue but rather a question - Where could I get all the datasets you reported to in the paper ? Do you think that training on ALL datasets together would improve the results ? What about training for various languages - do you think a model containing text for mixed languages would behave better or worse than models handling each language separately ?

And another question regarding phrases - the google's word2vec pretrained vectors include also phrases - were they taken into account as well ?

yoonkim commented 7 years ago

Hi, you can obtain all the datasets here:

https://github.com/harvardnlp/sent-conv-torch

Phrases were not taken into account from word2vec.

pianoman4873 commented 7 years ago

thanks