piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.55k stars 4.37k forks source link

Wrong power base in LDA Model log_perplexity documentation #2623

Open mf908 opened 4 years ago

mf908 commented 4 years ago

Problem description

Gensim LDAModel documentation incorrect

Steps/code/corpus to reproduce

Based on the code in log_perplexity, it looks like it should be e^(-bound) since all of the functions used in computing it seem to be using the natural logarithm/e

mpenkov commented 4 years ago

Thank you for pointing this out.

Could you please be more specific? What documentation, what file, what part of that file in particular?

mf908 commented 4 years ago

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/ldamodel.py

The log_perplexity function where it says:

Calculate and return per-word likelihood bound, using a chunk of documents as evaluation corpus. Also output the calculated statistics, including the perplexity=2^(-bound), to log at INFO level.

mpenkov commented 4 years ago

If you look at the source code, you'l see that the base is 2:

https://github.com/RaRe-Technologies/gensim/blob/e1025743dd022ff87b55dde8ef2c85167d2e469d/gensim/models/ldamodel.py#L824

This appears to be correct (matches the docstring). Where are you seeing e as a base?

mf908 commented 4 years ago

If you look at the bound function in ldamodel.py all of the functions there utilize natural log as opposed to base 2.

Xilorole commented 3 years ago

I was facing the same issue. At a log level, it prints exponential value with base 2 but the function returns a value of base e. This should be pointed out while this is a bit confusing.

Lehas-sudo commented 1 year ago

so how can i calculate the correct perplexity with gensim?