Closed avinashsai closed 6 years ago
Hello @avinashsai
gensim
/ smart_open
version you use? Iam using google colab notebooks with latest gensim version and python3
My code:
import gensim.downloader as api
info = api.info()
model = api.load("glove-twitter-25")
NameError: name 'unicode' is not defined
Not enough information, I don't see any unicode
call as in your code, same about code of loader. Provide at least full stack-trace, please.
NameError Traceback (most recent call last)
<ipython-input-54-cdd45c81647a> in <module>()
----> 1 model = api.load("glove-twitter-25")
/usr/local/lib/python3.6/dist-packages/gensim/downloader.py in load(name, return_path)
416 sys.path.insert(0, base_dir)
417 module = __import__(name)
--> 418 return module.load_data()
419
420
/content/gensim-data/glove-twitter-25/__init__.py in load_data()
6 def load_data():
7 path = os.path.join(base_dir, 'glove-twitter-25', 'glove-twitter-25.gz')
----> 8 model = KeyedVectors.load_word2vec_format(path)
9 return model
/usr/local/lib/python3.6/dist-packages/gensim/models/keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
1117 return _load_word2vec_format(
1118 Word2VecKeyedVectors, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
-> 1119 limit=limit, datatype=datatype)
1120
1121 def get_keras_embedding(self, train_embeddings=False):
/usr/local/lib/python3.6/dist-packages/gensim/models/utils_any2vec.py in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
172 logger.info("loading projection weights from %s", fname)
173 with utils.smart_open(fname) as fin:
--> 174 header = utils.to_unicode(fin.readline(), encoding=encoding)
175 vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
176 if limit:
<ipython-input-49-595fa41a7f04> in any2unicode(text, encoding, errors)
2 if isinstance(text, str):
3 return text
----> 4 return unicode(text.replace('\xc2\x85', '<newline>'), encoding, errors=errors)
NameError: name 'unicode' is not defined
Looks pretty strange (because I see a different code in the codebase) - https://github.com/RaRe-Technologies/gensim/blob/c1e6c65d75c134e71a24fbf9fdecf448972d5316/gensim/utils.py#L339
I also re-check now and this works as expected for 2.7
, 3.5
and 3.6
.
Try to re-install gensim.
I close this issue because this isn't reproducible.
Maybe something to do with the google colab environment? Does unicode
exist there? I mean no gensim, just directly from the shell/notebook.
@piskvorky unicode
doesn't exist in py3.6
, most strange things here is different code in stacktrace.
Right. Looks like any2unicode
was redefined somewhere by the user.
When glove vectors are downloaded and loaded into model it shows 'unicode' not defines