Open aniket-sen opened 6 years ago
See the ReadMe. Pre-Trained Word Vectors are learned from New York Times Annotated Corpus (LDC Data LDC2008T19), which should be obtained from LDC (https://catalog.ldc.upenn.edu/LDC2008T19). And we also provide the word embedding file 'vec.bin' used in the experiments in data.zip.
You didn't answer my last question
You can use gensim to train vector on your own dataset
Would you like to share how the word embedding file was created, like what procedure was used. And also if I want this algorithm to work on my dataset, how am I supposed to create a word embedding file for my dataset