slycoder / R-lda

Latent Dirichlet allocation package for R
16 stars 23 forks source link

sLDA multinomial logit extensions #1

Closed aykutfirat closed 11 years ago

aykutfirat commented 11 years ago

Requires nnet's multinom for logit. Changes do not require any modifications to the interface, except logical labels need to be passed as consecutive integers starting from zero. logistic=TRUE will uniformly treat binary and multi-class cases.

In the gibbs.c insted of using the number of unique words, nw, in dv updates, I decided to use the total number of words, nws, after observing some numerical instability when the denominator in the logit function reached very high values because of large word counts. This choice of nws seemingly, have stopped that issue. This modification has been used only in the slda and logistic case.

I also created a demo using the newsgroup data set, which has been added to the package. It can be run with demo(sldamc) and will display the classification accuracy and confusion matrix. For comparison purposes I also ran the binary case, and obtained same level of classification accuracies with the original lda (>0.96 in test, and >0.99 when using training again as test)

Manual files have also been edited and created to reflect the changes.

slycoder commented 11 years ago

Awesome stuff, sorry it took me a while to take a look at it! I put some comments into the pull request which might help simplify stuff significantly.