Closed andra-pumnea closed 6 years ago
Tensorflow doesn't allow assigning arrays more than 2 GB
as Variable
. There are two ways you can work around this:
Typically, only a fraction of the total of words from the glove/word2vec embeddings is used in the model. You can extract the embeddings for the words present in your corpus beforehand and then only use these feed word embeddings in the .config
file. For format of the .txt
file containing the embeddings you can look here: https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view
You can work with the entire word embedding files by changing the assigning operation in the code a little. A very good answer on how to do this is provided here: https://stackoverflow.com/questions/35394103/initializing-tensorflow-variable-with-an-array-larger-than-2gb
Thank you!
The papers mentions that glove embeddings were used for word representation layer. However, when I tried to train with glove.840B.300d.txt I got the following error:
Cannot create a tensor proto whose content is larger than 2GB.
What preprocessing is applied for obtaining wordvec?