piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.56k stars 4.37k forks source link

Doc2Vec to wikipedia articles notebook error - object has no attribute #2085

Open impulsecorp opened 6 years ago

impulsecorp commented 6 years ago

For the Doc2Vec to wikipedia articles notebook (https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb) I get this error:

pre = Doc2Vec(min_count=0)
pre.scan_vocab(documents)
executed in 11ms, finished 09:09:33 2018-06-07
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-0b281772fe9f> in <module>()
      1 pre = Doc2Vec(min_count=0)
----> 2 pre.scan_vocab(documents)
AttributeError: 'Doc2Vec' object has no attribute 'scan_vocab'

I also get a similar error for the next cell of the notebook:

AttributeError: 'Doc2Vec' object has no attribute 'scale_vocab'

Your Doc2Vec notebook on the Lee dataset (https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb) works fine for me though.

======================= I am using Ubuntu 16.04.4 , Python 3.6, and the latest version of Gensim.

piskvorky commented 6 years ago

@impulsecorp can you please try with the latest release, 3.5.0? CC @gojomo

gojomo commented 6 years ago

I think this is likely fallout from #1777 @manneshiva refactorings.

menshikh-iv commented 6 years ago

@impulsecorp is a problem still reproduced with gensim==3.5.0?

impulsecorp commented 6 years ago

Sorry, I don't have that project running any more, so I can't try it again.

menshikh-iv commented 6 years ago

Reproduced with gensim==3.5.0

from gensim.models import Doc2Vec

model = Doc2Vec()
model.scan_vocab()
AttributeError                            Traceback (most recent call last)
<ipython-input-6-4f495aae01df> in <module>()
      2 
      3 model = Doc2Vec()
----> 4 model.scan_vocab()

AttributeError: 'Doc2Vec' object has no attribute 'scan_vocab'

but I'm not sure that this is a bug because user shouldn't call this method (only build_vocab should be used), for this reason, need to update notebook (not d2v code).

gojomo commented 6 years ago

Pre #1777, users could choose to call the 3 steps of build_vocab() (scan_vocab(), prepare_vocab(), finalize_vocab()) manually, instead of just build_vocab(), in either Word2Vec or Doc2Vec, if they wanted to do extra reporting/tinkering between those steps.

menshikh-iv commented 6 years ago

@gojomo yes, and scan_vocab and prepare_vocab still available as methods of model.vocabulary, but finalize_vocab partially replaced with model.trainables.prepare_weights

dnabanita7 commented 5 years ago

can i work on this?

menshikh-iv commented 5 years ago

@Naba7 feel free to take any open issue (no need to ask each time) :)

gojomo commented 5 years ago

IMHO ideal fix would restore the symmetry/consistency in Word2Vec-related classes before #1777, where you could always replace a single build_vocab() with individual constituent steps with consistent names, if you wanted more control (such as via extra actions/analysis between steps). And, to prevent future regressions, test methods confirming such an 'un-bundling' of the build_vocab() steps gives the same results.