[$WORD_EMBEDDINGS_PATH] when pre-training embedding model

vineetjohn / linguistic-style-transfer

Neural network parametrized objective to disentangle and transfer style and content in text

Apache License 2.0

138 stars 33 forks source link

Here's what each of those bash variables mean:

${WORD_EMBEDDINGS_PATH} -> Word2Vec embeddings trained from scratch on only the training data of the specific dataset for which we want to transfer style.
${VALIDATION_WORD_EMBEDDINGS_PATH} -> Pre-trained Glove embeddings sed purely for evaluation purposes (to compute cosine similarity as done by Fu et. al. (AAAI, 2018)) and has nothing to do with the training.

To answer your questions

Can I use glove.6B.100d.txt for pre-training word embedding step?

No, not directly. The model expects word2vec embeddings. If you have a way to convert the glove embeddings into corresponding word2vec embeddings, then you can use the converted embeddings.

If not, which embedding files should I use in this case?

If you want to replicate the paper exactly, follow the steps listed here.

vineetjohn / linguistic-style-transfer

[$WORD_EMBEDDINGS_PATH] when pre-training embedding model #70