Max authors - Githubissues

keyonvafa / tbip

Text-Based Ideal Points

MIT License

44 stars 15 forks source link

Max authors #3

Closed josuetsm closed 4 years ago

josuetsm commented 4 years ago

Hi, I've been using your repository a few weeks ago to estimate social media user ideal points. However, I have noticed that when I try to run the model with more than 800 authors, the model does not converge. Specifically the ELBO returned nan values. Have you ever run the model with more than 800 authors? Also, do you know some article that discusses variational inference convergence problems? I think my problem may be due to the number of parameters but I am not sure. I would appreciate if you could guide me, Thanks!

keyonvafa commented 4 years ago

Hi, hmm. I haven't tried running with 800 authors but that shouldn't be the nan issue (each author is only adding one extra parameter to the model). Out of curiosity, what happens if you keep the dataset the same but change the author indices so that there are only 2 authors (i.e. incorrectly label the authors)? I assume the nans would still be there, but if they're not, that would confirm that the issue is with the number of authors.

Are you using the TensorFlow or PyTorch implementation? And what is the vocabulary size and the number of documents you're using?

josuetsm commented 4 years ago

Sorry for the lateness of my reply, I had to pause the project for a while. Changing the author indices leaving only 2 authors solved the problem. However, days later I was able to find the root of my problem and it was not the number of authors. The problem was generated because I had authors with 0 vocabulary words and the optimization placed a 0 in the rate parameter of the Poisson distribution, generating Nans in the log_prob. However, eliminating the authors with 0 words in the vocabulary, I have been able to estimate ideal points for datasets with 100,000 authors. Thanks for the answer and for the excellent tutorial on Google Colab.

keyonvafa commented 4 years ago

Great, I'm glad it's working now. And thank you!