hammadshaikhha / Data-Science-and-Machine-Learning-from-Scratch

Implements common data science methods and machine learning algorithms from scratch in python. Intuition and theory behind the algorithms is also discussed.
430 stars 234 forks source link

Calculation of <prob_topics> in <Latent Dirichlet Allocation.ipynb> #2

Open DaveRockt opened 6 years ago

DaveRockt commented 6 years ago

Hello,

is the calculation of the conditional probability of assigning each topic correct? It does not seem to be the same as in the referenced Wikipedia article.

Best, David

hammadshaikhha commented 6 years ago

@DaveRockt Thanks for raising the issue, I just noticed this. Could you be a bit more specific and compare the formula I have with the one in wikipedia?

I will review my LDA notebook in the meantime and see if I can spot a mistake.

DaveRockt commented 6 years ago

Thank you for your answer and sorry for not being specific:

In the formula on Wikipedia, I cannot find what you call 'denom1'. In the meantime, however, I found out that after normalising, I get the same result as you.

However, I have another question:

In your code in the 'Main part of LDA algorithm' under 'Add in current word back into count matrixes', you use 'init_topic_assign'. Does this make sense? Shouldn't you use the new assigned topic?

Thank you for this project by the way, it really helped me to understand LDA better.

Best, David

hammadshaikhha commented 6 years ago

Hi David,

Sorry for the late reply, I skimmed over the code and you may be right. I think one way to check would be to use a python library and run LDA on the same data set and see whether the results match. Could you work on doing that?

If there is indeed a mistake in the code, feel free to fix it and do a pull request.

DaveRockt commented 6 years ago

Thank you. At the moment I am also a little bit busy. But I will have a look asap.

Best, David

Von: Hammad Shaikh [mailto:notifications@github.com] Gesendet: Montag, 16. Juli 2018 21:11 An: hammadshaikhha/Math-of-Machine-Learning-Course-by-Siraj Math-of-Machine-Learning-Course-by-Siraj@noreply.github.com Cc: Bälz, David (ECON) david.baelz@kit.edu; Mention mention@noreply.github.com Betreff: Re: [hammadshaikhha/Math-of-Machine-Learning-Course-by-Siraj] Calculation of in (#2)

Hi David,

Sorry for the late reply, I skimmed over the code and you may be right. I think one way to check would be to use a python library and run LDA on the same data set and see whether the results match. Could you work on doing that?

If there is indeed a mistake in the code, feel free to fix it and do a pull request.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hammadshaikhha/Math-of-Machine-Learning-Course-by-Siraj/issues/2#issuecomment-405349642, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Amz8fubvCBTTcGWc5s4ge38pHiMuND1wks5uHOVZgaJpZM4U9Mo6.