Closed JaimieMurdock closed 9 years ago
Okay, I made very similar changes before I read my email. You can now pass a seed to random_corpus
and LdaCgsSeq.train
. Note that there's also a demostration function in ldacgsseq
named demo_LdaCgsSeq
which builds a random corpus for you and trains on it; this now takes additional parameters corpus_seed
and model_seed
.
The changes have not been made to LdaCgsMulti
, as the business about RNG over multiple threads is a bit funny. The default behavior is to pickle the random state and so pass identical copies of the random state to each of the threads. This is unacceptable. The current workaround is to insist that each thread reseed. Ideally there should be a global RNG for all threads which does not thwart the performance gains of the parallelism.
Thank you very much for the changes.
This functionality is made to match a behavior that was factored out, enabling experiments on different seeds for the random number generators. I was unsure of the line initializing
rstate
invsm/model/_cgs_update.pyx
, as I have not worked in cython before.Throughout the implementation I had to use a default value of 0, since cython would not accept "None" for
int seed
. Additionally, I added the functionality back to the originalldafunctions.cgs_update()
and added theindicies
kwarg, enabling the two commands (cython and native python) to be switched easily.The pull request should be merged into general-refactoring first, and should be properly marked in GitHub.