Closed AndreasMadsen closed 8 years ago
Hmm, that shouldn't happen.
When you see that warning -- can you dump the model to disk? And send it to me -- I'll have a look. No idea how a zero may get there, I don't think it can, unless something is very wonky with the input.
The sparse corpus you're using doesn't contain any explicit zeros, right? (=a vector element like (some_feature_id, 0.0)
, which is not allowed in sparse input).
Closig as report is incomplete.
hi, AndreasMadsen, I got the same warning, how did u solve the problem?
I don't think I did.
Got the same warning, any idea?
@yanshengjia can you sent model/dataset/code for training, that's needed to reproducing your problem
Yeah, I can send you the corpus which I use.
@yanshengjia corpus and concrete code, that produce this problem on your corpus. That's really important (because if we can't reproduce this problem, we can't help)
I have sent the corpus and the code to your gmail.
@yanshengjia I run this with a debugger (with np.seterr(all='raise')
), but the problem doesn't happen again (probably need exact seed
value for reproduce it)
@menshikh-iv I think the reason why I met that warning is that I didn't split the document. I split the doc and the warning doesn't show again. So It's ok now. Thx a lot!
i've got the same warning, any idea of solving it?
@liz282907 please ignore it (this is very strange, we need to investigate it properly) CC: @piskvorky
i've got the same warning, any idea of solving it?
/opt/anaconda3/lib/python3.6/site-packages/gensim/models/ldamodel.py:775: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta)
This is the exact warning that I am getting
I had the same problem, I suffer for days then I find it!!!! in my project I try different values for alpha and beta, then I set it to default.and it works.
In my project, alpha and beta are all defaults, but this problem occurred, I donnot know how to solve it
I'm getting the same error: /venv/lib/python2.7/site-packages/gensim/models/ldamodel.py:1023: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta) I'm training with different topic sizes, from 10 to 100, and measuring the coherence to decide how many topics retain. Everything went well until 80 topics, where I got this error. It seems to be quite random. I'm not saving the model, it was just a small experiment, so there is no way I can reproduce it.
Same situation as @annalina Code runs lda for multiple topics, num_topic =[20,60] work fine, num_topics=100, built with same corpus and dictionary, gives:
/usr/local/lib/python3.5/dist-packages/gensim/models/ldamodel.py:775: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta) /usr/local/lib/python3.5/dist-packages/gensim/models/ldamodel.py:509: RuntimeWarning: overflow encountered in add sstats[:, ids] += np.outer(expElogthetad.T, cts / phinorm) /usr/local/lib/python3.5/dist-packages/gensim/models/ldamodel.py:519: RuntimeWarning: invalid value encountered in multiply sstats *= self.expElogbeta
Can anybody provide
please @mrvsppr @annalina @xiaokc @HibaJak @RohitRaj2017 @Gladysgong @liz282907
I'm having the same problem now. Code looks like this. Corpus, id2word and text all work in basic LDA, HDP, and LSI.
import operator
from gensim.models.ldamodel import LdaModel
def very_tuned_LDA(corpus, dictionary, lemma_text):
top_topics = [(0, 0)]
while top_topics[0][1] < 0.97:
lm = LdaModel(corpus=corpus, id2word=dictionary, alpha = "auto")
coherence_values = {}
for n, topic in lm.show_topics(num_topics = -1, formatted=False):
#lda.num_topics fix for bug in show_topics
topic = [word for word, _ in topic]
cm = CoherenceModel(topics=[topic], texts=lemma_text,
dictionary=dictionary, window_size=10)
coherence_values[n] = cm.get_coherence()
top_topics = sorted(coherence_values.items(), key=operator.itemgetter(1), reverse=True)
return lm, top_topics
@itsbrycehere hello, can you minimize your code example (get rid of CoherenceModel, because it's unrelated) and attach concrete dictionary & corpus, because issue not reproduced with this minimal example
from gensim.test.utils import common_corpus, common_dictionary
from gensim.models import LdaModel
model = LdaModel(corpus=common_corpus, id2word=common_dictionary, alpha="auto")
Shared code and dictionary + corpus in dropbox with @menshikh-iv python3.5 NAME="Ubuntu" VERSION="16.04.4 LTS (Xenial Xerus)" numpy.version.version = 1.14.4 scipy.version.version =1.1.0
Despite the RuntimeWarning, the model is built and can be saved, fails in visualizing in pyLDAvis pyLDAvis._prepare.ValidationError:
Hi @mrvsppr, thanks, but I can't add it to my dropbox (dropbox say than 2 GB are not enough), how large is it? Can you share it with google-drive (or something else)?
BTW, better to place code here too and share only data (this is more transparent for community)
/home/nt/.local/lib/python3.5/site-packages/gensim/models/ldamodel.py:1023: RuntimeWarning: divide by zero encountered in log diff = np.log(self.expElogbeta)
lda = gensim.models.LdaMulticore(corpus=corpus, num_topics=TOPICS,alpha="symmetric",eta=0.01, iterations=200, id2word=dictionary , workers=11)
get the same warning now with this parameters any news?
lda = LdaMulticore(mm, id2word = id2word, workers=18, chunksize=2000, iterations=1000, num_topics=700, passes = 20)
get the same warning and any update here? @menshikh-iv thanks!
Try to increase the internal precision by providing dtype=np.float64 as an argument to the LdaModel (default is np.float32). This will prevent the model from truncating very low values to .0, resulting in np.log(.0) = -inf, which then triggers this warning.
However, I'm not sure if some values are supposed to be as low as 1.219e-47, so maybe there is indeed some underlying issue with the implementation, but I'm probably just too paranoid. For my own sanity I evaluated both word depths and couldn't see much difference with respect to perplexity, coherence or convergence metrics.
@bylinn warning isn't really a problem, algorithm numerically unstable a bit, you really can try to increase precision a model as @andifunke suggested if you really worried.
I'm also experiencing this problem. I'm using the common_corpus and following the tutorial. Any ideas for resolving?
from gensim.test.utils import common_corpus from gensim.models import LdaSeqModel
ldaseq = LdaSeqModel(corpus=common_corpus, time_slice=[2, 4, 3], num_topics=2, chunksize=1)
C:\Users\Briana\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\ldaseqmodel.py:293: RuntimeWarning: divide by zero encountered in double_scalars convergence = np.fabs((bound - old_bound) / old_bound)
Still got the same problem when calculating the c_uci coherence.
RuntimeWarning: divide by zero encountered in scalar divide m_lr_i = np.log(numerator / denominator)
I have set the printoptin of numpy.
np.set_printoptions(precision=10)
or
np.set_printoptions(threshold=np.inf)
Did anyone solve the problem?
Sometimes when I use the LDA model, I get this warning:
I'm using Python 3.4.1 and the develop branch of gensim.