lfmatosm / embedded-topic-model

A package to run embedded topic modelling with ETM. Adapted from the original at: https://github.com/adjidieng/ETM
MIT License
85 stars 8 forks source link

Rec loss: nan #29

Open GareemaRanjan opened 3 months ago

GareemaRanjan commented 3 months ago

Describe the bug I am using ETM for topic modelling for a dataset of 50K documents. I am running the model multiple times (with random seed values) to find the appropriate value of K for my data. Sometimes, the model gives me the loss values as nan for the same K. This is a little random and I am not able to track why this happens.

INFO:root:Epoch 56 - Learning Rate: 0.005 - KL theta: nan - Rec loss: nan - NELBO: nan INFO:root:Epoch 57 - Learning Rate: 0.005 - KL theta: nan - Rec loss: nan - NELBO: nan

Once this happens, for all the epochs in that run, the loss values are nan.

Reproduction example Here is how I am using the model:

etm_instance = ETM( vocabulary, num_topics=k, epochs=100, debug_mode=True, seed=random_seed, )

I am new to topic modelling (and machine learning). Is there something I am missing?

lfmatosm commented 2 months ago

Hi @GareemaRanjan! Thanks for your report and sorry for the delay.

On your example, you are not passing the embeddings parameter. Is that intended? E.g. do you want to also learn word embeddings alongside topic embeddings? If that's your intention, you also need to pass train_embeddings=True, because this feature is disabled by default.

Also, if you can share a reproducible and/or more complete code example, I can reproduce it myself.