jmschrei / yahmm

Yet Another Hidden Markov Model repository.
MIT License
249 stars 32 forks source link

Random number weirdness #12

Open jmschrei opened 10 years ago

jmschrei commented 10 years ago

I've been having this "bug" for a little bit, wanted to see if anyone else knew about it.

When I write test code, I seed random.seed(0). I will then randomly generate a sequence to test, with the assumption that the sequence will be the same each time, since I set the seed.

Occasionally, what will happen is that the first time I run a program, I will get sequence A, then every other time I will get sequence B, just by rerunning the code. All yahmm operations function appropriately, it's just that the random seed different. If I modify yahmm.pyx in any way (even to add comments), I will get sequence A again, then sequence B every other time.

Any thoughts?

nipunbatra commented 10 years ago

Would you want to try setting the seed using numpy and see if you get the same behavior.

jmschrei commented 10 years ago

I do set the seed using random. If there were an issue where the seed was changed every iteration, I wouldn't get B constantly after the first trial.

adamnovak commented 10 years ago

OK, I've been looking at this issue today. I was trying to add support for running the proposed nose tests with python setup.py test as well as through nose directly with nosetests. Depending on which way I ran the tests, I would get different results for the things that depend on random numbers. The two approaches were building the Cython module slightly differently, and producing slightly different .so files, but the differences in the C code were all in obscure macro arguments and didn't look to have much to do with randomness.

My conclusion is that the global state of the random module is the problem, and that it somehow manages to not be properly shared between Python and Cython, amybe in a way that somehow depends on import order. I put a seed call in the actual Cython model sample function, and that alleviated the first-run-after-deleting-the-built-library-vs-other-runs problem for at least one of the test execution methods. But the different methods still gave different results.

I think if we want this to work properly, we need to move away from the Python random module. It might be best to use something that doesn't use global state for the RNG, for that matter.

We could also try making sure that all the functions called in the course of sampling are pure Python, for which we'd probably have to move them outside the .pyx file. This would probably make sampling super slow.

tlnagy commented 10 years ago

Would it be possible to stick to rand from stdlib for all of yahmm's random number usage and just add a convenience function to seed this from python (using srand)?

adamnovak commented 10 years ago

It would be possible, but we'd have to re-work some of the distribution implementations. We rely on the Python random library's implementations for sampling from standard things like normal distributions.

On Wed, Jul 30, 2014 at 7:06 AM, Tamas Nagy notifications@github.com wrote:

Would it be possible to stick to rand from stdlib for all of yahmm's random number usage and just add a convenience function to seed this from python (using srand)?

— Reply to this email directly or view it on GitHub https://github.com/jmschrei/yahmm/issues/12#issuecomment-50618642.