Closed shivank01 closed 6 years ago
For continue training, you should save all model, not only word vectors, i.e.
model.train(...)
model.save("path/to/model")
and after you can load it and continue training
model = Word2Vec.load("path/to/model")
model.train(...)
Method save_word2vec_format
save only word-vectors (not a full model) for later usage like similarity search or something else, but not for training.
I know this is closed but I just need to know is it possible (and how) to get full word2vec model from glove trained vectors file. Documentation only shows how to get keyedVectors after running "glove2word2vec" script. Actually I want to train the model with a small dataset that I have and apparently it's not possible to train keyed vectors.
@DarkKnight1991
get full word2vec model from glove trained vectors file
That's impossible because glove file contains only trained vectors (1 matrix, for training you need 2 matrices). See also https://radimrehurek.com/gensim/models/keyedvectors.html#why-use-keyedvectors-instead-of-a-full-model
I agree that full model would be resource consuming. I am working on a problem where I have some 50-80 documents each with ~300 sentences. I have to be able to answer some questions using NLP concepts. Only training with given data is not giving me good results, so I thought may be training with bigger sources and then updating (training) model with my data might help. I have tried word vectors, sentence vectors ( by averaging word vecs) and doc2vec. Preprocessing includes stemming and lemmatizing using NLTK. I'm not sure if this is the right platform to ask but do you have any suggestions on what am I missing?
@DarkKnight1991
Read more about data augmentation in NLP, you have very small train dataset, it's a good idea to extend it. FYI, better to ask questions in mailing list
I want to do Online Training/ Resuming training on my previously trained model. But it is showing the error
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'train'
.My code is:
import gensim.models.keyedvectors as word2vec
model = word2vec.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
model.KeyedVectors.train("Hello")