kzhai / InfVocLDA

Online Latent Dirichlet Allocation with Infinite Vocabulary using Variational Inference
https://github.com/kzhai/InfVocLDA
Apache License 2.0
74 stars 19 forks source link

batch_size, e_step function #5

Closed raphaelsty closed 5 years ago

raphaelsty commented 5 years ago

Hello, I would really like to add your lda with an infinite vocabulary feature to the Creme library (online machine learning library) https://github.com/creme-ml/creme

I have a doubt about the e_step function, you initiate the batch_size variable from the length of the wordids variable however a few lines later, your comment suggests that batch_size is an integer that refers to the number of documents.

Here len(wordids) = number of words in the document if you set batch_size to 1.

Did you voluntarily initialize the batch_size variable from len(wordids)? Should batch_size = self._batch_size?

https://github.com/kzhai/InfVocLDA/blob/05a87890d613b07f7b0c2d2bb6c79aad39e2f75d/src/infvoc/hybrid.py#L279-L280

Your comment:

https://github.com/kzhai/InfVocLDA/blob/05a87890d613b07f7b0c2d2bb6c79aad39e2f75d/src/infvoc/hybrid.py#L292-L293

Thank you in advance for your feedback to confirm that batch_size = len(wordids)

Raphaël

kzhai commented 5 years ago

Hi, Raphael,

Thanks for your interest. As I recall, you can set the batch size to any arbitrary number. Usually, default it to a small value, e.g., 1 or 10, would work just fine. Also, setting it to 1 also simulates the real online case. Feel free to set it to any number.

Best, Ke

On Wed, Apr 17, 2019 at 3:42 PM Raphael Sourty notifications@github.com wrote:

Hello, I would really like to add your lda with an infinite vocabulary feature to the Creme library (online machine learning library) https://github.com/creme-ml/creme

I have a doubt about the e_step function, you initiate the batch_size variable from the length of the wordids variable however a few lines later, your comment suggests that batch_size is an integer that refers to the number of documents.

Here len(wordids) = number of words in the document if you set batch_size to 1.

Did you voluntarily initialize the batch_size variable from len(wordids)? Should batch_size = self._batch_size?

https://github.com/kzhai/InfVocLDA/blob/05a87890d613b07f7b0c2d2bb6c79aad39e2f75d/src/infvoc/hybrid.py#L279-L280

Your comment:

https://github.com/kzhai/InfVocLDA/blob/05a87890d613b07f7b0c2d2bb6c79aad39e2f75d/src/infvoc/hybrid.py#L292-L293

Thank you in advance for your feedback to confirm that batch_size = len(wordids)

Raphaël

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kzhai/InfVocLDA/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/AAESKRTWTM3BJYSFQWNH4RLPQ6RTZANCNFSM4HGYL4FA .

raphaelsty commented 5 years ago

Thank you :-)