Simplify LDA input parameterization

stan-dev / example-models

Example models for Stan

http://mc-stan.org/

772 stars 479 forks source link

Simplify LDA input parameterization #143

Open gokceneraslan opened 5 years ago

gokceneraslan commented 5 years ago

I tried to simplify LDA input representation by using a simple M x V matrix of word frequencies where M and V represent number of documents and words. In the model, now instead of iterating over all words of all documents, iterations are over each element of the M x V matrix.

bob-carpenter commented 5 years ago

Thanks for submitting. I've been out for a while, so haven't been able to review this, but I'll get to it ASAP.

bob-carpenter commented 5 years ago

Oh, and I'd suggest adding suffixes to existing model names like _counts to indicate you're taking sufficient stats rather than the raw data.

gokceneraslan commented 5 years ago

Oh, and I'd suggest adding suffixes to existing model names like _counts to indicate you're taking sufficient stats rather than the raw data.

You mean adding _counts to the new model? Because it's the one uses counts.

bob-carpenter commented 5 years ago

Anything to distinguish the way in which data is coded in the two approaches. So yes, I meant keeping .stan as is and adding _counts.stan or something similar for the sufficient stats version.

On Nov 17, 2018, at 6:04 PM, Gökçen Eraslan notifications@github.com wrote:

Oh, and I'd suggest adding suffixes to existing model names like _counts to indicate you're taking sufficient stats rather than the raw data.

You mean adding _counts to the new model? Because it's the one uses counts.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.