piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.65k stars 4.37k forks source link

RuntimeWarning: overflow encountered in exp2 perwordbound.. when training LDA model #2733

Open kabuaisha-wish opened 4 years ago

kabuaisha-wish commented 4 years ago

Problem description

When training lda model

    model = LdaModel(
        corpus=corpus,
        id2word=id2word,
        chunksize=2000, # default
        alpha='auto', # asymmetric alpha prior computed from corpus not supported for multicore
        eta='auto',
        iterations=50, # default
        num_topics=2000,
        passes=2, # default
    )

with Number of unique tokens: 71959 Number of documents: 418422

I get this runtime warning

/usr/local/lib/python2.7/dist-packages/gensim-3.7.2-py2.7-linux-x86_64.egg/gensim/models/ldamodel.py:824: RuntimeWarning: overflow encountered in exp2
  perwordbound, np.exp2(-perwordbound), len(chunk), corpus_words

Versions

Python 2.7.6 (default, Nov 13 2018, 12:45:42) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; print(platform.platform())
Linux-5.3.0-26-generic-x86_64-with-Ubuntu-14.04-trusty
>>> import sys; print("Python", sys.version)
('Python', '2.7.6 (default, Nov 13 2018, 12:45:42) \n[GCC 4.8.4]')
>>> import numpy; print("NumPy", numpy.__version__)
('NumPy', '1.11.3')
>>> import scipy; print("SciPy", scipy.__version__)
('SciPy', '0.18.1')
>>> import gensim; print("gensim", gensim.__version__)
('gensim', '3.8.1')
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
('FAST_VERSION', 1)
chencjiajy commented 4 years ago

I also get this RuntimeWarning too