piskvorky / gensim

Topic Modelling for Humans
GNU Lesser General Public License v2.1
15.7k stars 4.38k forks source link

Google Cloud Jupiter Notebook: #2566

Open athammad opened 5 years ago

athammad commented 5 years ago


On a Google Cloud Jupiter Notebook, I am trying to import a Glove txt file using the following commands but I keep gettting the same error.

from gensim.scripts.glove2word2vec import glove2word2vec 
from gensim.test.utils import get_tmpfile
tmp_file = get_tmpfile("test_word2vec.txt")
glove2word2vec("glove.6B.300d.txt", tmp_file)

The error is:

ValueError                                Traceback (most recent call last)
<ipython-input-8-21d446f7b71f> in <module>
----> 1 externalModel=KeyedVectors.load_word2vec_format(tmp_file)

~/.local/lib/python3.5/site-packages/gensim/models/keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
   1496         return _load_word2vec_format(
   1497             cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
-> 1498             limit=limit, datatype=datatype)
   1500     def get_keras_embedding(self, train_embeddings=False):

~/.local/lib/python3.5/site-packages/gensim/models/utils_any2vec.py in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
    392                 parts = utils.to_unicode(line.rstrip(), encoding=encoding, errors=unicode_errors).split(" ")
    393                 if len(parts) != vector_size + 1:
--> 394                     raise ValueError("invalid vector on line %s (is this really the text format?)" % line_no)
    395                 word, weights = parts[0], [datatype(x) for x in parts[1:]]
    396                 add_word(word, weights)

ValueError: invalid vector on line 59941 (is this really the text format?)

I have also tried to upgrade everything following the answers to similar issues with the following commands

pip3 install google-compute-engine
pip3 install --upgrade gensim smart_open

My current version is the following

Name: gensim
Version: 3.8.0
Summary: Python framework for fast Vector Space Modelling
Home-page: http://radimrehurek.com/gensim
Author: Radim Rehurek
Author-email: me@radimrehurek.com
License: LGPLv2.1
Location: /home/jupyter/.local/lib/python3.5/site-packages
Requires: scipy, numpy, smart-open, six


mpenkov commented 5 years ago

@athammad Some questions: