rlabbe / filterpy

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.
MIT License
3.35k stars 624 forks source link

filterpy.stats.rand_student_t calculation question #222

Open risa2000 opened 3 years ago

risa2000 commented 3 years ago

I came across this function while reading https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/03-Gaussians.ipynb. It is copy pasted into the book:

def rand_student_t(df, mu=0, std=1):
    """return random number distributed by Student's t 
    distribution with `df` degrees of freedom with the 
    specified mean and standard deviation.
    """
    x = random.gauss(0, std)
    y = 2.0*random.gammavariate(0.5*df, 2.0)
    return x / (math.sqrt(y / df)) + mu

This function is used to generate 5000 sample variety and then plots it into a graph:

def sense_t():
    return 10 + rand_student_t(7)*2

zs = [sense_t() for i in range(5000)]
plt.plot(zs, lw=1);

I wanted to replace filterpy implementation with scipy.stats.t this way:

from scipy.stats import t
zs = t.rvs(df=7, loc=10, scale=2, size=5000)
plt.plot(zs, lw=1)

which should, if my understanding is correct, produce an equivalent result (sample from Student's t-distributed random variable with mean=10, std=2, and 7 degrees of freedom).

However when looking at the plots, it seems that filterpy version gives a bit "tighter" spread with shorter outliers. When looking into the filterpy implementation, I understood that it uses chi-square and gamma distributed random variables to express the Student's t-distributed one:

T = Z(mu, std) / sqrt(V(df)/df)
V = G(df/2, 2)

where Z(mu, std) is normally distributed random variable, and V = G(df/2, 2) is gamma distributed random variable (using scale notation).

This should be fine except the filterpy code uses V = 2*G(df/2, 2). Incidentally, when I remove multiplication by 2 from the formula, it seems to closer correspond to scipy.stats.t sample. So I wonder from where the multiplication by 2 comes?

rlabbe commented 3 years ago

I honestly have no idea, and believe you are correct!