jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

'Word2Vec' object has no attribute 'infer_vector' #18

Open crowoy opened 6 years ago

crowoy commented 6 years ago

I've cloned the repo, downloaded the pre-trained Wikipedia model, and installed Gensim via pip install git+https://github.com/jhlau/gensim.

Then I pasted the downloaded model files into the toy_data directory and changed the model line in the file to: model="toy_data/word2vec.bin".

However, when I run infer_test.py I get the following error:

Traceback (most recent call last):
  File "infer_test.py", line 25, in <module>
    output.write( " ".join([str(x) for x in m.infer_vector(d, alpha=start_alpha, steps=infer_epoch)]) + "\n" )
AttributeError: 'Word2Vec' object has no attribute 'infer_vector'
jhlau commented 6 years ago

Looks like it's probably not reading from my forked version of gensim. Can you test if you can call infer_test manually with an interactive python session?

crowoy commented 6 years ago

Thank you for the prompt reply!

What do you mean by calling infer_test with an interactive python session?

I've tried doing the following:

python
import gensim.models as g
model = g.Doc2Vec.load("toy_data/word2vec.bin")
model.infer_vector("this is a test".split(), alpha=0.01, steps=1000)

Which generates the same issue.

crowoy commented 6 years ago

Could it be to do with dependencies of Gensim?

(env) $ pip list
Package         Version
--------------- ---------
boto            2.48.0
boto3           1.7.4
botocore        1.10.4
bz2file         0.98
certifi         2018.1.18
chardet         3.0.4
docutils        0.14
futures         3.2.0
gensim          0.12.4
idna            2.6
jmespath        0.9.3
numpy           1.14.2
pip             10.0.0
python-dateutil 2.6.1
requests        2.18.4
s3transfer      0.1.13
scipy           1.0.1
setuptools      39.0.1
six             1.11.0
smart-open      1.5.7
urllib3         1.22
wheel           0.31.0
jhlau commented 6 years ago

Weird, I just tried doing a fresh install (with virtualenv) and have no problems. Your gensim version seems to be right too (0.12.4), so I am not sure why this is happening.

Can you try create a new virtualenv, install gensim like you did before and try again?

crowoy commented 6 years ago

So I deleted the env, and run the following:

$ virtualenv env
$ source env/bin/activate
(env) $ pip install git+https://github.com/jhlau/gensim
(env) $ python
>>> import gensim.models as g
>>> model = g.Doc2Vec.load("toy_data/word2vec.bin")
>>> model.infer_vector("this is a test".split(), alpha=0.01, steps=1000)

And is still outputs:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Word2Vec' object has no attribute 'infer_vector'
jhlau commented 6 years ago

Does the train script (train_model.py) work?

crowoy commented 6 years ago

Looks to be working fine:

(env) $ python train_model.py
2018-04-16 11:22:21,169 : INFO : collecting all words and their counts
2018-04-16 11:22:21,169 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-04-16 11:22:21,223 : INFO : collected 11097 word types and 1000 unique tags from a corpus of 1000 examples and 84408 words
2018-04-16 11:22:21,272 : INFO : min_count=1 retains 11097 unique words (drops 0)
2018-04-16 11:22:21,272 : INFO : min_count leaves 84408 word corpus (100% of original 84408)
2018-04-16 11:22:21,325 : INFO : deleting the raw counts dictionary of 11097 items
2018-04-16 11:22:21,325 : INFO : sample=1e-05 downsamples 3599 most-common words
2018-04-16 11:22:21,325 : INFO : downsampling leaves estimated 22704 word corpus (26.9% of prior 84408)
2018-04-16 11:22:21,326 : INFO : estimated required memory for 11097 words and 300 dimensions: 33381300 bytes
2018-04-16 11:22:21,377 : INFO : resetting layer weights
2018-04-16 11:22:21,377 : INFO : loading pre-trained embeddings
2018-04-16 11:22:21,819 : INFO : 1000 lines processed (0.441607952118s); 969 embeddings collected
2018-04-16 11:22:22,129 : INFO : training model with 1 workers on 11129 vocabulary and 300 features, using sg=1 hs=0 sample=1e-05 negative=5
2018-04-16 11:22:22,129 : INFO : expecting 1000 sentences, matching count from corpus used for vocabulary survey
2018-04-16 11:22:23,205 : INFO : PROGRESS: at 1.29% examples, 28676 words/s, in_qsize 2, out_qsize 0
2018-04-16 11:22:24,212 : INFO : PROGRESS: at 2.48% examples, 28295 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:25,259 : INFO : PROGRESS: at 3.76% examples, 28426 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:26,349 : INFO : PROGRESS: at 5.04% examples, 28352 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:27,436 : INFO : PROGRESS: at 6.36% examples, 28413 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:28,470 : INFO : PROGRESS: at 7.54% examples, 28139 words/s, in_qsize 1, out_qsize 0
2018-04-16 11:22:29,493 : INFO : PROGRESS: at 8.71% examples, 27971 words/s, in_qsize 1, out_qsize 0
jhlau commented 6 years ago

Yea this really beats me. The train_model.py loads pre-trained embeddings and it won't work if you use the canonical gensim, so it looks like your gensim version is right but somehow it doesn't see infer_vector...

rafiqhasan commented 6 years ago

Any fix on this yet ? We are facing a similar issue.

Abhimanyu100 commented 4 years ago

I am facing this same issue with doc2vec. My code run fine on Google Cloab, but getting an error while running locally on my own system. AttributeError: 'NumpyArrayWrapper' object has no attribute 'infer_vector'

Saatvik-droid commented 1 year ago

Did anyone solve this? I get the following error

model.infer_vector(["my", "input]))
          ^^^^^^^^^^^^^^^^^^
AttributeError: 'Word2Vec' object has no attribute 'infer_vector'

Pip freeze:

gensim @ file:///home/saat/Projects/gensim
numpy==1.24.3
scipy==1.10.1
six==1.16.0
smart-open==6.3.0
Saatvik-droid commented 1 year ago

Running and saving my own model with train_model.py it can use infer_vector.