inpho / vsm

Vector Space Model Framework developed for InPhO
http://inpho.github.io/vsm
Other
35 stars 14 forks source link

adding random seed functionality #84

Closed JaimieMurdock closed 9 years ago

JaimieMurdock commented 9 years ago

This functionality is made to match a behavior that was factored out, enabling experiments on different seeds for the random number generators. I was unsure of the line initializing rstate in vsm/model/_cgs_update.pyx, as I have not worked in cython before.

Throughout the implementation I had to use a default value of 0, since cython would not accept "None" for int seed. Additionally, I added the functionality back to the original ldafunctions.cgs_update() and added the indicies kwarg, enabling the two commands (cython and native python) to be switched easily.

The pull request should be merged into general-refactoring first, and should be properly marked in GitHub.

rrose1 commented 9 years ago

Okay, I made very similar changes before I read my email. You can now pass a seed to random_corpus and LdaCgsSeq.train. Note that there's also a demostration function in ldacgsseq named demo_LdaCgsSeq which builds a random corpus for you and trains on it; this now takes additional parameters corpus_seed and model_seed.

The changes have not been made to LdaCgsMulti, as the business about RNG over multiple threads is a bit funny. The default behavior is to pickle the random state and so pass identical copies of the random state to each of the threads. This is unacceptable. The current workaround is to insist that each thread reseed. Ideally there should be a global RNG for all threads which does not thwart the performance gains of the parallelism.

Thank you very much for the changes.