piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.64k stars 4.37k forks source link

divide by zero encountered in log #217

Closed AndreasMadsen closed 8 years ago

AndreasMadsen commented 10 years ago

Sometimes when I use the LDA model, I get this warning:

gensim/models/ldamodel.py:602: RuntimeWarning: divide by zero encountered in log
diff = numpy.log(self.expElogbeta)

I'm using Python 3.4.1 and the develop branch of gensim.

piskvorky commented 10 years ago

Hmm, that shouldn't happen.

When you see that warning -- can you dump the model to disk? And send it to me -- I'll have a look. No idea how a zero may get there, I don't think it can, unless something is very wonky with the input.

The sparse corpus you're using doesn't contain any explicit zeros, right? (=a vector element like (some_feature_id, 0.0), which is not allowed in sparse input).

tmylk commented 8 years ago

Closig as report is incomplete.

ocsponge commented 7 years ago

hi, AndreasMadsen, I got the same warning, how did u solve the problem?

AndreasMadsen commented 7 years ago

I don't think I did.

yanshengjia commented 6 years ago

Got the same warning, any idea?

menshikh-iv commented 6 years ago

@yanshengjia can you sent model/dataset/code for training, that's needed to reproducing your problem

yanshengjia commented 6 years ago

Yeah, I can send you the corpus which I use.

menshikh-iv commented 6 years ago

@yanshengjia corpus and concrete code, that produce this problem on your corpus. That's really important (because if we can't reproduce this problem, we can't help)

yanshengjia commented 6 years ago

I have sent the corpus and the code to your gmail.

menshikh-iv commented 6 years ago

@yanshengjia I run this with a debugger (with np.seterr(all='raise')), but the problem doesn't happen again (probably need exact seed value for reproduce it)

yanshengjia commented 6 years ago

@menshikh-iv I think the reason why I met that warning is that I didn't split the document. I split the doc and the warning doesn't show again. So It's ok now. Thx a lot!

liz282907 commented 6 years ago

i've got the same warning, any idea of solving it?

menshikh-iv commented 6 years ago

@liz282907 please ignore it (this is very strange, we need to investigate it properly) CC: @piskvorky

Christings commented 6 years ago

i've got the same warning, any idea of solving it?

RohitRaj2017 commented 6 years ago

/opt/anaconda3/lib/python3.6/site-packages/gensim/models/ldamodel.py:775: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta)

This is the exact warning that I am getting

HibaJak commented 6 years ago

I had the same problem, I suffer for days then I find it!!!! in my project I try different values for alpha and beta, then I set it to default.and it works.

xiaokc commented 6 years ago

In my project, alpha and beta are all defaults, but this problem occurred, I donnot know how to solve it

annalina commented 6 years ago

I'm getting the same error: /venv/lib/python2.7/site-packages/gensim/models/ldamodel.py:1023: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta) I'm training with different topic sizes, from 10 to 100, and measuring the coherence to decide how many topics retain. Everything went well until 80 topics, where I got this error. It seems to be quite random. I'm not saving the model, it was just a small experiment, so there is no way I can reproduce it.

mrvsppr commented 6 years ago

Same situation as @annalina Code runs lda for multiple topics, num_topic =[20,60] work fine, num_topics=100, built with same corpus and dictionary, gives:

/usr/local/lib/python3.5/dist-packages/gensim/models/ldamodel.py:775: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta) /usr/local/lib/python3.5/dist-packages/gensim/models/ldamodel.py:509: RuntimeWarning: overflow encountered in add sstats[:, ids] += np.outer(expElogthetad.T, cts / phinorm) /usr/local/lib/python3.5/dist-packages/gensim/models/ldamodel.py:519: RuntimeWarning: invalid value encountered in multiply sstats *= self.expElogbeta

menshikh-iv commented 6 years ago

Can anybody provide

please @mrvsppr @annalina @xiaokc @HibaJak @RohitRaj2017 @Gladysgong @liz282907

itsbrycehere commented 6 years ago

I'm having the same problem now. Code looks like this. Corpus, id2word and text all work in basic LDA, HDP, and LSI.


import operator
from gensim.models.ldamodel import LdaModel

def very_tuned_LDA(corpus, dictionary, lemma_text):
    top_topics = [(0, 0)]
    while top_topics[0][1] < 0.97:
        lm = LdaModel(corpus=corpus, id2word=dictionary, alpha = "auto")
        coherence_values = {}
        for n, topic in lm.show_topics(num_topics = -1, formatted=False): 
            #lda.num_topics fix for bug in show_topics
            topic = [word for word, _ in topic]
            cm = CoherenceModel(topics=[topic], texts=lemma_text, 
                                                    dictionary=dictionary, window_size=10)
            coherence_values[n] = cm.get_coherence()
        top_topics = sorted(coherence_values.items(), key=operator.itemgetter(1), reverse=True)
    return lm, top_topics
menshikh-iv commented 6 years ago

@itsbrycehere hello, can you minimize your code example (get rid of CoherenceModel, because it's unrelated) and attach concrete dictionary & corpus, because issue not reproduced with this minimal example

from gensim.test.utils import common_corpus, common_dictionary
from gensim.models import LdaModel

model = LdaModel(corpus=common_corpus, id2word=common_dictionary, alpha="auto")
mrvsppr commented 6 years ago

Shared code and dictionary + corpus in dropbox with @menshikh-iv python3.5 NAME="Ubuntu" VERSION="16.04.4 LTS (Xenial Xerus)" numpy.version.version = 1.14.4 scipy.version.version =1.1.0

Despite the RuntimeWarning, the model is built and can be saved, fails in visualizing in pyLDAvis pyLDAvis._prepare.ValidationError:

menshikh-iv commented 6 years ago

Hi @mrvsppr, thanks, but I can't add it to my dropbox (dropbox say than 2 GB are not enough), how large is it? Can you share it with google-drive (or something else)?

BTW, better to place code here too and share only data (this is more transparent for community)

ntedgi commented 6 years ago

/home/nt/.local/lib/python3.5/site-packages/gensim/models/ldamodel.py:1023: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta)

lda = gensim.models.LdaMulticore(corpus=corpus, num_topics=TOPICS,alpha="symmetric",eta=0.01, iterations=200, id2word=dictionary , workers=11)

get the same warning now with this parameters any news?

bylinn commented 6 years ago

lda = LdaMulticore(mm, id2word = id2word, workers=18, chunksize=2000, iterations=1000, num_topics=700, passes = 20)

get the same warning and any update here? @menshikh-iv thanks!

andifunke commented 5 years ago

Try to increase the internal precision by providing dtype=np.float64 as an argument to the LdaModel (default is np.float32). This will prevent the model from truncating very low values to .0, resulting in np.log(.0) = -inf, which then triggers this warning.

However, I'm not sure if some values are supposed to be as low as 1.219e-47, so maybe there is indeed some underlying issue with the implementation, but I'm probably just too paranoid. For my own sanity I evaluated both word depths and couldn't see much difference with respect to perplexity, coherence or convergence metrics.

menshikh-iv commented 5 years ago

@bylinn warning isn't really a problem, algorithm numerically unstable a bit, you really can try to increase precision a model as @andifunke suggested if you really worried.

BJWipf commented 4 years ago

I'm also experiencing this problem. I'm using the common_corpus and following the tutorial. Any ideas for resolving?

from gensim.test.utils import common_corpus from gensim.models import LdaSeqModel

ldaseq = LdaSeqModel(corpus=common_corpus, time_slice=[2, 4, 3], num_topics=2, chunksize=1)

C:\Users\Briana\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\ldaseqmodel.py:293: RuntimeWarning: divide by zero encountered in double_scalars convergence = np.fabs((bound - old_bound) / old_bound)

Yaeyang commented 11 months ago

Still got the same problem when calculating the c_uci coherence. RuntimeWarning: divide by zero encountered in scalar divide m_lr_i = np.log(numerator / denominator) I have set the printoptin of numpy. np.set_printoptions(precision=10) or np.set_printoptions(threshold=np.inf)

Did anyone solve the problem?