Open yassmine-lam opened 3 years ago
From gensim you can load GloVe pretrained weights of different sizes:
Here is the GloVe Official Page
Download the file from the website above. You can then substitute the file name in glove_file
with the path to the file that you have downloaded.
This is how you would want to implement it within sent2vec
from sent2vec.vectorizer import Vectorizer
from sent2vec.splitter import Splitter
from gensim.test.utils import get_tmpfile
from gensim.scripts.glove2word2vec import glove2word2vec
sentences = [
"Alice is in the Wonderland.",
"Alice is not in the Wonderland.",
]
glove_file = 'glove.6B.300d.txt'
word2vec_glove_file = get_tmpfile("glove.6B.300d.word2vec.txt")
glove2word2vec(glove_file, word2vec_glove_file)
splitter = Splitter()
splitter.sent2words(sentences=sentences, remove_stop_words=['not'], add_stop_words=[])
vectorizer = Vectorizer()
vectorizer.word2vec(splitter.words, pretrained_vectors_path= word2vec_glove_file)
vectors = vectorizer.vectors
I hope it helps.
Hi,
I tried to use word2vec code with glove embeddings glove.6B.300d.txt but I got this error
ValueError: invalid literal for int() with base 10: 'the'
Could someone help plz
thank u