macroevolution / bamm

A program for multimodel inference on speciation and trait evolution
GNU General Public License v2.0
33 stars 10 forks source link

Support user-supplied random number seed #1

Closed josephwb closed 10 years ago

josephwb commented 11 years ago

For replication purposes. Default to clock (if provided seed == -1), otherwise use user-provided seed.

josephwb commented 10 years ago

Implemented for speciation-extinction. Soon to be for trait model as well.

josephwb commented 10 years ago

Hmm. running longer tests, runs diverge after a few thousand generations. Not sure what is going on...

drabosky commented 10 years ago

BTW I pulled and compiled but had no issues - the random seed worked fine (tested it up to 10^6 gens). I assume you have implemented yet for traits but worked fine in speciationextinction.

On Nov 17, 2013, at 11:57 AM, Joseph W. Brown wrote:

Hmm. running longer tests, runs diverge after a few thousand generations. Not sure what is going on...

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

Implemented for both now, but both diverge for me. Here are some results for the whale tree with seed = 12345 and printFreq = 1000.

Run 1: Generation lnLik N_shifts LogPrior acceptRate 1 -328.788 0 0.120855 Accp: 0 1001 -267.902 1 -0.226387 Accp: 0.537 2001 -267.601 1 0.820076 Accp: 0.502 3001 -265.772 2 1.52255 Accp: 0.478 4001 -268.396 2 0.780745 Accp: 0.559 5001 -265.694 2 -0.223779 Accp: 0.547 6001 -272.976 1 1.36231 Accp: 0.577 7001 -264.579 1-0.0996446 Accp: 0.472 8001 -264.161 2 -3.35758 Accp: 0.525 9001 -263.454 1-0.0102534 Accp: 0.522 10001 -264.118 1 0.666584 Accp: 0.527

Run 2: Generation lnLik N_shifts LogPrior acceptRate 1 -328.788 0 0.120855 Accp: 0 1001 -267.902 1 -0.226387 Accp: 0.537 2001 -267.601 1 0.820076 Accp: 0.502 3001 -265.772 2 1.52255 Accp: 0.478 4001 -259.861 2 -4.16825 Accp: 0.532 <-- Diverged 5001 -266.556 1 0.747653 Accp: 0.5 6001 -267.167 1 0.479961 Accp: 0.535 7001 -270.296 1 -0.506432 Accp: 0.455 8001 -262.076 2 -2.80532 Accp: 0.528 9001 -260.871 2 -0.257305 Accp: 0.527 10001 -266.33 1 0.733088 Accp: 0.529

Diverges at random generations (i.e. not consistent across replicate runs).

Maybe another OS-specific issue?

drabosky commented 10 years ago

I just tried your seed & example, worked fine.

did you make clean before rebuilding (i assume you did, but weirdly, i got seg faults after pulling your last commit until I cleaned everything out again! make clean fixed it).

On Nov 17, 2013, at 12:59 PM, Joseph W. Brown wrote:

Implemented for both now, but both diverge for me. Here are some results for the whale tree with seed = 12345 and printFreq = 1000.

Run #1: Generation lnLik N_shifts LogPrior acceptRate 1 -328.788 0 0.120855 Accp: 0 1001 -267.902 1 -0.226387 Accp: 0.537 2001 -267.601 1 0.820076 Accp: 0.502 3001 -265.772 2 1.52255 Accp: 0.478 4001 -268.396 2 0.780745 Accp: 0.559 5001 -265.694 2 -0.223779 Accp: 0.547 6001 -272.976 1 1.36231 Accp: 0.577 7001 -264.579 1-0.0996446 Accp: 0.472 8001 -264.161 2 -3.35758 Accp: 0.525 9001 -263.454 1-0.0102534 Accp: 0.522 10001 -264.118 1 0.666584 Accp: 0.527

Run #2: Generation lnLik N_shifts LogPrior acceptRate 1 -328.788 0 0.120855 Accp: 0 1001 -267.902 1 -0.226387 Accp: 0.537 2001 -267.601 1 0.820076 Accp: 0.502 3001 -265.772 2 1.52255 Accp: 0.478 4001 -259.861 2 -4.16825 Accp: 0.532 <-- Diverged 5001 -266.556 1 0.747653 Accp: 0.5 6001 -267.167 1 0.479961 Accp: 0.535 7001 -270.296 1 -0.506432 Accp: 0.455 8001 -262.076 2 -2.80532 Accp: 0.528 9001 -260.871 2 -0.257305 Accp: 0.527 10001 -266.33 1 0.733088 Accp: 0.529

Diverges at random generations (i.e. not consistent across replicate runs).

Maybe another OS-specific issue?

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

Yeah, I always clean and rebuild it all to avoid those weird behaviours. Problem persists...

I will try this on my wife's machine (different OS).

drabosky commented 10 years ago

Can you test if has something to do with BAMM versus the RNG or MbRandom class itself, like have dummy function in main that just calls the RNG a few million times to see if still diverges with no calls to BAMM associated functions?

On Nov 17, 2013, at 1:09 PM, Joseph W. Brown wrote:

Yeah, I always clean and rebuild it all to avoid those weird behaviours. Problem persists...

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

Yeah, I'll do that to localize the issue.

BTW, it works correctly on my wife's 10.8.5 machine.

josephwb commented 10 years ago

So, uniformRv() runs consistently (tested up to 10MIL), and is the same across OSs.

The diverging of BAMM results seems to happen (for me) after a few tens of thousand generations; i.e. not right away. Are there proposals/operations that occur very infrequently?

drabosky commented 10 years ago

ok, this is weird. I wonder how we might be able to profile this.

I have to wonder if this could be some weird numerical thing. For example, maybe I have an inconsistent calculation (or improperly initialized variable) or something that is fine on my compiler, but for you it leads to very slight changes in the likelihoods (some address overhang or something in dynamic memory) that ultimately cause things to diverge.

Maybe try setting sampleFromPriorOnly = 1 and run for a million gens. This may help localize things.

I could see this being a headache to diagnose, especially if I can't replicate on my machine.

On Nov 17, 2013, at 2:16 PM, Joseph W. Brown wrote:

So, uniformRv() runs consistently (tested up to 10MIL), and is the same across OSs.

The diverging of BAMM results seems to happen (for me) after a few tens of thousand generations; i.e. not right away. Are there proposals/operations that occur very infrequently?

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

Well, this seems to have been 'solved' on non-10.9 macs. I will therefore close it now. I am tempted to open a new issue "why things break on 10.9", but will resist for the time being. This, of course, should not be taken lightly; I imagine a lot of people (the majority?) have upgraded, or will soon do so, to 10.9, so this problem will not go away.

drabosky commented 10 years ago

Carlos - if you have 10.9 on one of your machines, maybe you can try some debugging on this if we can replicate it. I am definitely concerned with the RNG performance and it makes me wonder what could possibly be going on.

On Nov 17, 2013, at 9:46 PM, Joseph W. Brown wrote:

Well, this seems to have been 'solved' on non-10.9 macs. I will therefore close it now. I am tempted to open a new issue "why things break on 10.9", but will resist for the time being. This, of course, should not be taken lightly; I imagine a lot of people (the majority?) have upgraded, or will soon do so, to 10.9, so this problem will not go away.

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

One worry I have is the reference of "two" random number seeds (from MbRandom.cpp), emphasized as 'TWO' below:

/ This function sets the TWO seeds for the random number generator, using the current time. \brief Initializes random number seeds. \return This function does not return anything. \throws Does not throw an error. / void MbRandom::setSeed(void) { seed = (long int)( time( 0 ) ); }

/ This function sets the TWO seeds for the random number generator. If only one starting value is given we set the two seeds by using the two least signficant bytes as one seed and the two most significant bytes shifted to the right as the second seed. \brief Initializes random number seeds. \return This function does not return anything. \throws Does not throw an error. / void MbRandom::setSeed(long int s) { seed = s; }

/ This function gets the TWO seeds from the random number generator. \brief Return the random number seeds. \param i1 [in/out] the first seed \param i2 [in/out] the second seed \return This function does not return anything. \throws Does not throw an error. / long int MbRandom::getSeed(void) { return seed; }

I am not sure where the '2nd' seed comes from, or how to manually supply it myself.

drabosky commented 10 years ago

weird - but shouldn't this have come up in your tests of the RNG outside of the bamm code?

On Nov 17, 2013, at 10:06 PM, Joseph W. Brown wrote:

One worry I have is the reference of "two" random number seeds (from MbRandom.cpp), emphasized as 'TWO' below:

/ This function sets the TWO seeds for the random number generator, using the current time. \brief Initializes random number seeds. \return This function does not return anything. \throws Does not throw an error. / void MbRandom::setSeed(void) { seed = (long int)( time( 0 ) ); }

/ This function sets the TWO seeds for the random number generator. If only one starting value is given we set the two seeds by using the two least signficant bytes as one seed and the two most significant bytes shifted to the right as the second seed. \brief Initializes random number seeds. \return This function does not return anything. \throws Does not throw an error. / void MbRandom::setSeed(long int s) { seed = s; }

/ This function gets the TWO seeds from the random number generator. \brief Return the random number seeds. \param i1 [in/out] the first seed \param i2 [in/out] the second seed \return This function does not return anything. \throws Does not throw an error. / long int MbRandom::getSeed(void) { return seed; }

I am not sure where the '2nd' seed comes from, or how to manually supply it myself.

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

Even though the documentation of the MbRandom code mentions two seeds, I have yet to find a way to supply more than a single seed.

redcurry commented 10 years ago

It looks like you supply the two seeds in a single long int, using bitshifting to place them. Bitshifting allows you to move bits around a memory location, so they probably assign the seed the first number, then shift those bits to the left (filling the empty spaces with 0), and then adding the second number. Now the single long int contains "two" seeds.

josephwb commented 10 years ago

Is this where things are diverging for me? Should we implement an alternative way to set the second seed?

drabosky commented 10 years ago

I admit i'm confused by the documentation and am not sure its correct for that class. Here's the core RNG algorithm in the MbRandom class: (most of the random numbers are based on simple transformations of the uniform RNG below). I see nothing about bit shifting or 2 seeds! This looks like a pretty simple generator.

/*!

drabosky commented 10 years ago

OK, that code post looks terrible. in any event go look at MbRandom::uniformRV(void)

drabosky commented 10 years ago

Joseph, Carlos - one thing to try would be to make a dummy/experimental function for uniformRV in MbRandom that uses another uniformRV generator (maybe one from std). This would be a trivial change and would be good to know if it fixes the behavior joseph is observing. If it doesn't, that suggests that it is a problem with something else.

The fact that the runs diverge does not mean, necessarily, that it is the fault of the RNG. If there is some weird numerical behavior going on because of (say) something not properly initialized, then the runs could diverge even if the same sequence of random numbers is called.

On Nov 18, 2013, at 10:33 AM, Joseph W. Brown wrote:

Is this where things are diverging for me? Should we implement an alternative way to set the second seed?

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

Yeah, I confirmed that uniformRV runs identically across OSs, so that is not it.

drabosky commented 10 years ago

OK, this is a major issue (in my opinion). We may actually have to figure out the exact function call that causes these to diverge. This could be a pain.

On Nov 18, 2013, at 10:48 AM, Joseph W. Brown wrote:

Yeah, I confirmed that uniformRV runs identically across OSs, so that is not it.

— Reply to this email directly or view it on GitHub.

josephwb commented 10 years ago

Agreed. I am starting a new issue (since the code itself is complete).

Dan -- can you get Pascal to try running some replicate analyses with a constant seed value? He is the only other I know that is using 10.9. Just want to make sure that this is not just my problem alone.

redcurry commented 10 years ago

I ran bamm for 2,000,000 generations and didn't observe any divergence. I'm using OS X 10.7.5.