Closed hyeseonko closed 5 years ago
Here's what each of those bash variables mean:
${WORD_EMBEDDINGS_PATH}
-> Word2Vec embeddings trained from scratch on only the training data of the specific dataset for which we want to transfer style.${VALIDATION_WORD_EMBEDDINGS_PATH}
-> Pre-trained Glove embeddings sed purely for evaluation purposes (to compute cosine similarity as done by Fu et. al. (AAAI, 2018)) and has nothing to do with the training.To answer your questions
Can I use glove.6B.100d.txt for pre-training word embedding step?
No, not directly. The model expects word2vec embeddings. If you have a way to convert the glove embeddings into corresponding word2vec embeddings, then you can use the converted embeddings.
If not, which embedding files should I use in this case?
If you want to replicate the paper exactly, follow the steps listed here.
Hi,
I wanted to pre-train word embedding models like these as you wrote in README.
./scripts/run_word_vector_training.sh \ --text-file-path ${TRAINING_TEXT_FILE_PATH} \ --model-file-path ${WORD_EMBEDDINGS_PATH}
but, why do you differentiate the word_embeddings_path and validation__word_embedding_path ?
Then, my question is
Thank you for releasing your code :)