Closed astrofrog closed 9 years ago
Just a quick example on the CDF sampling method I told you about. We can start by creating a PDF and CDF for the volumes of all cells:
In [1]: import numpy as np
In [2]: volumes = np.random.random(10)
In [7]: pdf = volumes / np.sum(volumes)
In [8]: cdf = np.cumsum(pdf)
In [9]: cdf
Out[9]:
array([ 0.1795304 , 0.33477385, 0.43718487, 0.45340509, 0.54523768,
0.65430424, 0.74303588, 0.82908545, 0.95476047, 1. ])
Let's say we want to sample 10000 values, we do this by sampling 10000 values between 0 and 1 and finding in which bin they fall:
In [10]: xi = np.random.random(10000)
In [11]: bin = np.searchsorted(cdf, xi)
and we can check what fraction fell in each bin and compare to the PDF:
In [13]: count = np.bincount(bin)
In [14]: count
Out[14]: array([1829, 1555, 1054, 147, 909, 1099, 834, 848, 1308, 417])
In [15]: count / np.sum(count)
Out[15]:
array([ 0.1829, 0.1555, 0.1054, 0.0147, 0.0909, 0.1099, 0.0834,
0.0848, 0.1308, 0.0417])
In [16]: pdf
Out[16]:
array([ 0.1795304 , 0.15524345, 0.10241102, 0.01622022, 0.0918326 ,
0.10906656, 0.08873164, 0.08604956, 0.12567503, 0.04523953])
which looks similar.
So then the idea is, if there is no minimum number of samples, you can just use something like this to decide how many samples to do in each cell. If min_cell_samples is set, then you first figure out what min_cell_samples * n_cells is. If it's larger than n_samples, then you use min_cell_samples in each cell and you don't need to do the above sampling. If min_cell_samples * n_cells < n_samples, you first sample min_cell_samples from each cell, then you distribute the remaining samples according to the above process.
Thanks @bluescarni for fixing this!
@bluescarni - I noticed an issue with the Voronoi function evaluation:
prints out:
but shouldn't these all be 10000?