piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.57k stars 4.37k forks source link

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'train' #2067

Closed shivank01 closed 6 years ago

shivank01 commented 6 years ago

I want to do Online Training/ Resuming training on my previously trained model. But it is showing the error AttributeError: 'Word2VecKeyedVectors' object has no attribute 'train' .

My code is: import gensim.models.keyedvectors as word2vec model = word2vec.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) model.KeyedVectors.train("Hello")

menshikh-iv commented 6 years ago

For continue training, you should save all model, not only word vectors, i.e.

model.train(...)
model.save("path/to/model")

and after you can load it and continue training

model = Word2Vec.load("path/to/model")
model.train(...)

Method save_word2vec_format save only word-vectors (not a full model) for later usage like similarity search or something else, but not for training.

nayash commented 6 years ago

I know this is closed but I just need to know is it possible (and how) to get full word2vec model from glove trained vectors file. Documentation only shows how to get keyedVectors after running "glove2word2vec" script. Actually I want to train the model with a small dataset that I have and apparently it's not possible to train keyed vectors.

menshikh-iv commented 6 years ago

@DarkKnight1991

get full word2vec model from glove trained vectors file

That's impossible because glove file contains only trained vectors (1 matrix, for training you need 2 matrices). See also https://radimrehurek.com/gensim/models/keyedvectors.html#why-use-keyedvectors-instead-of-a-full-model

nayash commented 6 years ago

I agree that full model would be resource consuming. I am working on a problem where I have some 50-80 documents each with ~300 sentences. I have to be able to answer some questions using NLP concepts. Only training with given data is not giving me good results, so I thought may be training with bigger sources and then updating (training) model with my data might help. I have tried word vectors, sentence vectors ( by averaging word vecs) and doc2vec. Preprocessing includes stemming and lemmatizing using NLTK. I'm not sure if this is the right platform to ask but do you have any suggestions on what am I missing?

menshikh-iv commented 6 years ago

@DarkKnight1991

Read more about data augmentation in NLP, you have very small train dataset, it's a good idea to extend it. FYI, better to ask questions in mailing list