hyperion-rt / hyperion

Hyperion Radiative Transfer Code
http://www.hyperion-rt.org
BSD 2-Clause "Simplified" License
52 stars 26 forks source link

Bug in Voronoi function evaluation #139

Closed astrofrog closed 9 years ago

astrofrog commented 9 years ago

@bluescarni - I noticed an issue with the Voronoi function evaluation:

import numpy as np

from hyperion.grid import VoronoiGrid

N = 10000
x = np.random.random(N)
y = np.random.random(N)
z = np.random.random(N)

def dens(x, y, z):
    print(x.shape, y.shape, z.shape)
    return np.zeros_like(x)

g = VoronoiGrid(x, y, z)
g.evaluate_function_average(dens, n_samples=10000)

prints out:

(4641,) (4641,) (4641,)

but shouldn't these all be 10000?

astrofrog commented 9 years ago

Just a quick example on the CDF sampling method I told you about. We can start by creating a PDF and CDF for the volumes of all cells:

In [1]: import numpy as np

In [2]: volumes = np.random.random(10)

In [7]: pdf = volumes / np.sum(volumes)

In [8]: cdf = np.cumsum(pdf)

In [9]: cdf
Out[9]: 
array([ 0.1795304 ,  0.33477385,  0.43718487,  0.45340509,  0.54523768,
        0.65430424,  0.74303588,  0.82908545,  0.95476047,  1.        ])

Let's say we want to sample 10000 values, we do this by sampling 10000 values between 0 and 1 and finding in which bin they fall:

In [10]: xi = np.random.random(10000)

In [11]: bin = np.searchsorted(cdf, xi)

and we can check what fraction fell in each bin and compare to the PDF:

In [13]: count = np.bincount(bin)

In [14]: count
Out[14]: array([1829, 1555, 1054,  147,  909, 1099,  834,  848, 1308,  417])

In [15]: count / np.sum(count)
Out[15]: 
array([ 0.1829,  0.1555,  0.1054,  0.0147,  0.0909,  0.1099,  0.0834,
        0.0848,  0.1308,  0.0417])

In [16]: pdf
Out[16]: 
array([ 0.1795304 ,  0.15524345,  0.10241102,  0.01622022,  0.0918326 ,
        0.10906656,  0.08873164,  0.08604956,  0.12567503,  0.04523953])

which looks similar.

So then the idea is, if there is no minimum number of samples, you can just use something like this to decide how many samples to do in each cell. If min_cell_samples is set, then you first figure out what min_cell_samples * n_cells is. If it's larger than n_samples, then you use min_cell_samples in each cell and you don't need to do the above sampling. If min_cell_samples * n_cells < n_samples, you first sample min_cell_samples from each cell, then you distribute the remaining samples according to the above process.

astrofrog commented 9 years ago

Thanks @bluescarni for fixing this!