tsterbak / depends-on-the-definition-comments

A repository just to host comments for the blog.

0 stars 0 forks source link

https://www.depends-on-the-definition.com/lda-from-scratch/ #1

Closed utterances-bot closed 2 years ago

utterances-bot commented 2 years ago

Latent Dirichlet allocation from scratch

Today, I’m going to talk about topic models in NLP. Specifically we will see how the Latent Dirichlet Allocation model works and we will implement it from scratch in numpy. What is a topic model? Assume we are given a large collections of documents.

https://www.depends-on-the-definition.com/lda-from-scratch/

JoeEmmens commented 2 years ago

Hey. This code and article is great use when learning the nitty gritty of how to write out an LDA model.

I have a quick question, when you re-sample the topic for a given word:

new_z = np.random.multinomial(1, p_z).argmax()

you select the max value, instead of choosing one topic given the probabilities, for example in

new_z = np.random.choice(np.arange(10), p=p_z)

why do you do this? I think you could be in danger of not escaping the initial random topic assignment?

Thanks!

savoga commented 2 years ago

Hey, thanks for the great article. I think there is a typo in your LateX formula of the Gibbs sampling: it misses a '+' in the nominator of the first fraction.

tsterbak commented 2 years ago

Thanks a lot for the hint! Fixed it now :)