Open breznak opened 10 years ago
What is the specific proposal? In FDRCSpatial2.py and elsewhere if the seed is specified as -1, it is set randomly. Otherwise it uses the specified seed value. This allows reproducibility where we need it.
Are you proposing we use a seed of -1 in the examples?
Yes, to use random seed whereever possible.
@subutai Do you agree with this proposal?
I do not feel so strong about this one anymore. Still it might be a way to go, to use random seed everywhere, but provide eg build flag (--debug) that would use fixed field for reproducibility and debugging.
I do. :-) (feel strongly)
Tests should be converted to use a MockRandom which returns an array of pre-configured random numbers whose total number equals the number of calls made during a single test. That way, that same mock array of "random" numbers can be used in ANY platform and we can still obtain the same results.
At some point guys, we're going to need a Validation Framework that can do a litmus test on a given port to decide if it adheres to computation standards. This was originally Matt's idea - and the only way this can be implemented (or at least a majorly immediate way) is to ELIMINATE the use of Random's in tests. That way we can compose a Litmus Validation Suite more easily when the time comes without a bunch of new work.
Sorry for the caps - but like I said, I feel strongly. (or is i the way I smell? I can't decide...) :-)
David
@cogmission Right now, most of the algorithm implementations uses a single cross-platform RNG that is inside nupic.core. The python and CPP implementations use the exact same code, this way we can do a bit by bit check to ensure exactness. This can be used across languages and across platforms. Even the numpy arrays are initialized with this single RNG. A good example of this is in research/spatial_pooler.py
If someone else wants to leverage this, they can simply use the same RNG available through nupic.core. We could also easily add a helper function to generate the list of random numbers that you wanted. This can be used in ports that don't link with nupic.core.
Would this work, or are you suggesting we implement another mechanism?
Hi Subutai,
I really think we should stick to the random number list altogether and wrap the RNG in a Mock wrapper that returns numbers off of the list. The list should be a text file that can be copied or read in dynamically by the MockRNG which is an interface that the library code calls that is proxies to the underlying mock in the case of tests...
I have done some work with the raw Mersenne Twister Java class that Fergal recommended and it is not easy to figure out which methods are an allegory for what Python calls are made. I managed to figure it out but there are still inaccuracies beyond the 6th decimal place that could accumulate. This is the reason I think explicit numbers should be used - eventually I think we all would like to see ports to all major platforms and languages and it is my (over opinionated) belief that this is the shortest and least encumbered route.
Cheers, David
On Thu, Aug 21, 2014 at 4:53 PM, Subutai Ahmad notifications@github.com wrote:
@cogmission https://github.com/cogmission Right now, most of the algorithm implementations uses a single cross-platform RNG that is inside nupic.core. The python and CPP implementations use the exact same code, this way we can do a bit by bit check to ensure exactness. This can be used across languages and across platforms. Even the numpy arrays are initialized with this single RNG. A good example of this is in research/spatial_pooler.py
If someone else wants to leverage this, they can simply use the same RNG available through nupic.core. We could also easily add a helper function to generate the list of random numbers that you wanted. This can be used in ports that don't link with nupic.core.
Would this work, or are you suggesting we implement another mechanism?
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/433#issuecomment-52990381.
...what I mean is:
Having a way to produce exact reproducible output will be crucial as the sophistication of the core code base evolves. There may come a time when peering into the code to discern subtle differences will not be as forthright an effort as it is now.
Cheers, David
PS not so easy with thumbs ! :-)
Sent from my iPhone
On Aug 21, 2014, at 7:46 PM, "cogmission1 ." cognitionmission@gmail.com wrote:
Hi Subutai,
I really think we should stick to the random number list altogether and wrap the RNG in a Mock wrapper that returns numbers off of the list. The list should be a text file that can be copied or read in dynamically by the MockRNG which is an interface that the library code calls that is proxies to the underlying mock in the case of tests...
I have done some work with the raw Mersenne Twister Java class that Fergal recommended and it is not easy to figure out which methods are an allegory for what Python calls are made. I managed to figure it out but there are still inaccuracies beyond the 6th decimal place that could accumulate. This is the reason I think explicit numbers should be used - eventually I think we all would like to see ports to all major platforms and languages and it is my (over opinionated) belief that this is the shortest and least encumbered route.
Cheers, David
On Thu, Aug 21, 2014 at 4:53 PM, Subutai Ahmad notifications@github.com wrote: @cogmission Right now, most of the algorithm implementations uses a single cross-platform RNG that is inside nupic.core. The python and CPP implementations use the exact same code, this way we can do a bit by bit check to ensure exactness. This can be used across languages and across platforms. Even the numpy arrays are initialized with this single RNG. A good example of this is in research/spatial_pooler.py
If someone else wants to leverage this, they can simply use the same RNG available through nupic.core. We could also easily add a helper function to generate the list of random numbers that you wanted. This can be used in ports that don't link with nupic.core.
Would this work, or are you suggesting we implement another mechanism?
— Reply to this email directly or view it on GitHub.
@cogmission Thanks, I think I understand your proposal now. Totally agree on the importance of reproducible output everywhere. First, the new TM code should be changed to use the RNG in nupic.core. I didn't realize it wasn't using it.
At that point all we would need is a mock API for other languages such as the Java port. We can have functionality in core that outputs the random numbers for a given seed to a file so that the Java mock API can read it in and replicate the core results. (Unless I'm missing something, I don't think our python or c++ code needs to call the mock API, since they can just use the real API and get exactly reproducible results.)
Hi Subutai,
I believe that is correct, they (python and C++ implementations) can proceed as they are now. I hate to be a stickler but the seed needs to be a "published" (i.e. in support documentation) static value because at current time the seed itself is the output of a function which obscures the value. (If I remember correctly, I can't confirm or point to this now).
Other than that, my "porting experience" has been pretty straight forward and very educational.
Thanks, David
On Tue, Aug 26, 2014 at 4:07 PM, Subutai Ahmad notifications@github.com wrote:
@cogmission https://github.com/cogmission Thanks, I think I understand your proposal now. Totally agree on the importance of reproducible output everywhere. First, the new TM code should be changed to use the RNG in nupic.core. I didn't realize it wasn't using it.
At that point all we would need is a mock API for other languages such as the Java port. We can have functionality in core that outputs the random numbers for a given seed to a file so that the Java mock API can read it in and replicate the core results. (Unless I'm missing something, I don't think our python or c++ code needs to call the mock API, since they can just use the real API and get exactly reproducible results.)
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/433#issuecomment-53492380.
Makes sense. This prompted me to look through some of our tests. We are not very consistent in our usage of RNG's. We use numpy's random in a lot of places. It would be good to do a sweep of this and make everything more consistent.
Agreed. Just because you guys are USHERING IN the future, doesn't mean you can PREDICT the future! :-) So, I wouldn't feel bad about ongoing changes that need to be made to accommodate expansion of scope and significance.
On Wed, Aug 27, 2014 at 10:52 AM, Subutai Ahmad notifications@github.com wrote:
Makes sense. This prompted me to look through some of our tests. We are not very consistent in our usage of RNG's. We use numpy's random in a lot of places. It would be good to do a sweep of this and make everything more consistent.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/433#issuecomment-53595206.
Makes sense. This prompted me to look through some of our tests. We are not very consistent in our usage of RNG's. We use numpy's random in a lot of places. It would be good to do a sweep of this and make everything more consistent.
Someone should create a ticket for that!
Matt Taylor OS Community Flag-Bearer Numenta
On Wed, Aug 27, 2014 at 8:56 AM, David Ray notifications@github.com wrote:
Agreed. Just because you guys are USHERING IN the future, doesn't mean you can PREDICT the future! :-) So, I wouldn't feel bad about ongoing changes that need to be made to accommodate expansion of scope and significance.
On Wed, Aug 27, 2014 at 10:52 AM, Subutai Ahmad notifications@github.com
wrote:
Makes sense. This prompted me to look through some of our tests. We are not very consistent in our usage of RNG's. We use numpy's random in a lot of places. It would be good to do a sweep of this and make everything more consistent.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/433#issuecomment-53595206.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/433#issuecomment-53595984.
@subutai said:
First, the new TM code should be changed to use the RNG in nupic.core. I didn't realize it wasn't using it.
This issue needs to be reviewed by the original author or another contributor for applicability to the current codebase. The issue might be obsolete or need updating to match current standards and practices. If the issue is out of date, please close. Otherwise please leave a comment to justify its continuing existence. It may be closed in the future if no further activity is noted.
I marked this as under review because IMO the scope is too big. It needs to be broken up into smaller tasks after a thorough review of the current codebase of the use of random numbers.
At that point all we would need is a mock API for other languages such as the Java port. We can have functionality in core that outputs the random numbers for a given seed to a file so that the Java mock API can read it in and replicate the core results. (Unless I'm missing something, I don't think our python or c++ code needs to call the mock API, since they can just use the real API and get exactly reproducible results.)
I agree. I believe this additional part should be added as another issue having to do with enabling the consistency of RNG output across languages.
In a lot of places in the code, we set random seed to obtain reproducible results. While it's perfectly correct to use for bit-to-bit comparisons of output - eg cpp/py implementation of SP, its use in examples to obtain "good" values for datasets is wrong.
It's just overfitting the parameters on the dataset when results with different random would drop rapidly in quality.
Another good use is randomization is it's dodgy, we'll have more problems, but find and fix bugs faster.