kylethayer / ifcsoft

Automatically exported from code.google.com/p/ifcsoft
GNU General Public License v3.0
0 stars 0 forks source link

Means and Standard Deviation should be from random sample in large data sets #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
If a data set is very large (say over maybe 10,000), rather than go through all 
points to find mean and especially standard deviation, the program should 
sample at most 10,000 random points from the data set to do so.

Mean might be left going through all since it can be computed with min and max, 
but standard deviation greatly slows down the start of an SOM calculation with 
large data sets (it is used for the default variance normalization).

Original issue reported on code.google.com by kyle.tha...@gmail.com on 3 Jun 2011 at 3:19