maximtrp / bitermplus

Biterm Topic Model (BTM): modeling topics in short texts
https://bitermplus.readthedocs.io/en/stable/
MIT License
77 stars 13 forks source link

The Perplexity is inf #7

Closed JennieGerhardt closed 3 years ago

JennieGerhardt commented 3 years ago

I wonder that under what circumstances the perplexity is inf

JennieGerhardt commented 3 years ago

when I delete the following 2 records , the perplexity returned to normal 吃 B12 MG 镁 or when I change the records to : 吃 b12 / intake B12 mg 镁 / 请问 Mg 镁 the perplexity returned to normal

It seems that the model work well in English When there are only one Chinese character and some uppercase English letters, the value of perplexity is inf.

maximtrp commented 3 years ago

Could you please post a reproducible example with a dataset and code?

sharathc10 commented 3 years ago

I am getting inf for just English phrases too and it is very inconsistent.

Thanks for your help

maximtrp commented 3 years ago

@sharathc10 I need a reproducible example to debug it. Could you please provide one?

maximtrp commented 3 years ago

@JennieGerhardt @sharathc10 Or could you perhaps post the code that you are using to train a model, make an inference, and calculate perplexity?

maximtrp commented 3 years ago

I have made a release that hopefully fixes this issue (0.6.8). Also, a more recent one is available (includes Renyi entropy calculation). Please reopen this issue if the bug persists.