Open crowoy opened 6 years ago
Looks like it's probably not reading from my forked version of gensim. Can you test if you can call infer_test manually with an interactive python session?
Thank you for the prompt reply!
What do you mean by calling infer_test
with an interactive python session?
I've tried doing the following:
python
import gensim.models as g
model = g.Doc2Vec.load("toy_data/word2vec.bin")
model.infer_vector("this is a test".split(), alpha=0.01, steps=1000)
Which generates the same issue.
Could it be to do with dependencies of Gensim?
(env) $ pip list
Package Version
--------------- ---------
boto 2.48.0
boto3 1.7.4
botocore 1.10.4
bz2file 0.98
certifi 2018.1.18
chardet 3.0.4
docutils 0.14
futures 3.2.0
gensim 0.12.4
idna 2.6
jmespath 0.9.3
numpy 1.14.2
pip 10.0.0
python-dateutil 2.6.1
requests 2.18.4
s3transfer 0.1.13
scipy 1.0.1
setuptools 39.0.1
six 1.11.0
smart-open 1.5.7
urllib3 1.22
wheel 0.31.0
Weird, I just tried doing a fresh install (with virtualenv) and have no problems. Your gensim version seems to be right too (0.12.4), so I am not sure why this is happening.
Can you try create a new virtualenv, install gensim like you did before and try again?
So I deleted the env, and run the following:
$ virtualenv env
$ source env/bin/activate
(env) $ pip install git+https://github.com/jhlau/gensim
(env) $ python
>>> import gensim.models as g
>>> model = g.Doc2Vec.load("toy_data/word2vec.bin")
>>> model.infer_vector("this is a test".split(), alpha=0.01, steps=1000)
And is still outputs:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Word2Vec' object has no attribute 'infer_vector'
Does the train script (train_model.py) work?
Looks to be working fine:
(env) $ python train_model.py
2018-04-16 11:22:21,169 : INFO : collecting all words and their counts
2018-04-16 11:22:21,169 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-04-16 11:22:21,223 : INFO : collected 11097 word types and 1000 unique tags from a corpus of 1000 examples and 84408 words
2018-04-16 11:22:21,272 : INFO : min_count=1 retains 11097 unique words (drops 0)
2018-04-16 11:22:21,272 : INFO : min_count leaves 84408 word corpus (100% of original 84408)
2018-04-16 11:22:21,325 : INFO : deleting the raw counts dictionary of 11097 items
2018-04-16 11:22:21,325 : INFO : sample=1e-05 downsamples 3599 most-common words
2018-04-16 11:22:21,325 : INFO : downsampling leaves estimated 22704 word corpus (26.9% of prior 84408)
2018-04-16 11:22:21,326 : INFO : estimated required memory for 11097 words and 300 dimensions: 33381300 bytes
2018-04-16 11:22:21,377 : INFO : resetting layer weights
2018-04-16 11:22:21,377 : INFO : loading pre-trained embeddings
2018-04-16 11:22:21,819 : INFO : 1000 lines processed (0.441607952118s); 969 embeddings collected
2018-04-16 11:22:22,129 : INFO : training model with 1 workers on 11129 vocabulary and 300 features, using sg=1 hs=0 sample=1e-05 negative=5
2018-04-16 11:22:22,129 : INFO : expecting 1000 sentences, matching count from corpus used for vocabulary survey
2018-04-16 11:22:23,205 : INFO : PROGRESS: at 1.29% examples, 28676 words/s, in_qsize 2, out_qsize 0
2018-04-16 11:22:24,212 : INFO : PROGRESS: at 2.48% examples, 28295 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:25,259 : INFO : PROGRESS: at 3.76% examples, 28426 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:26,349 : INFO : PROGRESS: at 5.04% examples, 28352 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:27,436 : INFO : PROGRESS: at 6.36% examples, 28413 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:28,470 : INFO : PROGRESS: at 7.54% examples, 28139 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:29,493 : INFO : PROGRESS: at 8.71% examples, 27971 words/s, in_qsize 1, out_qsize 0
Yea this really beats me. The train_model.py loads pre-trained embeddings and it won't work if you use the canonical gensim, so it looks like your gensim version is right but somehow it doesn't see infer_vector...
Any fix on this yet ? We are facing a similar issue.
I am facing this same issue with doc2vec. My code run fine on Google Cloab, but getting an error while running locally on my own system.
AttributeError: 'NumpyArrayWrapper' object has no attribute 'infer_vector'
Did anyone solve this? I get the following error
model.infer_vector(["my", "input]))
^^^^^^^^^^^^^^^^^^
AttributeError: 'Word2Vec' object has no attribute 'infer_vector'
Pip freeze:
gensim @ file:///home/saat/Projects/gensim
numpy==1.24.3
scipy==1.10.1
six==1.16.0
smart-open==6.3.0
Running and saving my own model with train_model.py
it can use infer_vector
.
I've cloned the repo, downloaded the pre-trained Wikipedia model, and installed Gensim via
pip install git+https://github.com/jhlau/gensim
.Then I pasted the downloaded model files into the
toy_data
directory and changed the model line in the file to:model="toy_data/word2vec.bin"
.However, when I run
infer_test.py
I get the following error: