rlabbe / Kalman-and-Bayesian-Filters-in-Python

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.
Other
16.52k stars 4.17k forks source link

Negative covariance matrix in Unscented Kalman filter #377

Open Hami1369 opened 3 years ago

Hami1369 commented 3 years ago

I want to implement the Unscented Kalman filter(UKF) method for the nonlinear problem; I set all initial values such as initial mean vector and initial covariance matrix. The dimension of the problem is 8. Below is the code of implementation; I do not know why the new covariance matrix at the end of the code includes some negative parameters. These negative parameters make problems for the next iteration of the approach.

#pip install filterpy
import numpy as np
from filterpy.kalman import unscented_transform, MerweScaledSigmaPoints

#initialization 
mean_zero = np.array([ 27.25,14.39, 4.459, 27.65 ,  6.27 , 23.653,  1.2  ,  0.21 ])
P_zero = np.diagflat((0.05*mean_zero)**2)
# initial parameters
T_2 =10
T_3 = 8
dt = 1
Q_1 = 15
Q_2 = 25
T_1 = 12

#sigma points parameters
alpha = 0.1
beta = 2
kappa = 0
n = 8

#create sigma points and weights
points = MerweScaledSigmaPoints(n=8, alpha=.1, beta=2., kappa=0)
sigmas = points.sigma_points(mean_zero, P_zero)

#sigma points weights
w_lambda = alpha**2 * (n + kappa) - n
Wc = np.full(2*n+1 , 0.5/(n + w_lambda))
Wm = np.full(2*n+1 , 0.5/(n + w_lambda))
Wc[0] = w_lambda / (n + w_lambda) + (1 - alpha**2 + beta)
Wm[0] = w_lambda / (n + w_lambda)

#process noise
Q = np.diagflat(((1*10**-5)*mean_zero)**2)
Q_n = np.random.multivariate_normal(np.zeros(8),Q,8)

#mesurement noise
R = (9*10**-2)*np.eye(2)
R_n = np.random.multivariate_normal(np.zeros(2),R,2)

#nonlinear_state_function
def f_nonlinear_state(T_2,T_3,R_21,R_32,C_2,C_3,w_1,w_2):
    T_2 = T_2 + dt*(-1*(T_2/C_2)*(1/R_21 + 1/R_32) + T_1/R_21*C_2 + T_3/R_32*C_2)
    T_3 = T_3+ dt*(-1*(T_3/C_3)*(1/R_32)+T_2/R_32*C_3 +w_1*Q_1/C_3 + w_2*Q_2/C_3)
    R_21 = R_21
    R_32 = R_32
    C_2 = C_2
    C_3 = C_3
    w_1 = w_1
    w_2 = w_2
    z = np.array([T_2 , T_3,R_21 , R_32,C_2 ,C_3 ,w_1,w_2])
    return z

#passing sigma points through nonlinear_state_function
sigmas_f = np.empty((17, 8))
for i in range(17):
    sigmas_f[i] = f_nonlinear_state(sigmas[i, 0], sigmas[i ,1],sigmas[i, 2],sigmas[i, 3],sigmas[i, 4],sigmas[i, 5],sigmas[i, 6],sigmas[i, 7])

ukf_f_mean, ukf_f_cov = unscented_transform(sigmas_f, points.Wm, points.Wc)

#nonlinear mesurement function
def h_mesurement(T_2,T_3):
    T_2 = T_2
    T_3 = T_3
    y = np.array([T_2,T_3])
    return y
#passing sigmas_f through mesurement function 
sigmas_h = np.empty((17, 2))
for i in range(17):
    sigmas_h[i] = h_mesurement(sigmas_f[i, 0], sigmas_f[i ,1])

ukf_h_mean, ukf_h_cov = unscented_transform(sigmas_h, points.Wm, points.Wc)

#cross covarinace
Pfh = np.zeros((8, 2))
for i in range(17):
    Pfh += Wc[i] * np.outer(sigmas_f[i] - ukf_f_mean,sigmas_h[i] - ukf_h_mean)
#Kalman gain
K = np.dot(Pfh, np.linalg.inv(ukf_h_cov)) 

#True value of the estimate 
h = np.array([47.39642954, 55.42371109]) 

#New mean and New covariance 

New_mean = ukf_f_mean + np.dot(K , h-ukf_h_mean ) 
New_covarince = ukf_f_cov - np.dot(K,ukf_h_cov).dot(K.T)
New_covarince
rlabbe commented 3 years ago

This code is very unreadable; if you have 4 spaces before each line github will transform it into valid python.

I don't understand your problem construction. You have a state vector with 8 components, but only compute the first two. So the unscented transform is creating sigma points around R21, R32, etc, but your function doesn't compute new values for them. It's not surprising that the code finds negative correlations between the various values.

I can't be sure, I had to guess at the meaning of the code giving the bad formatting and numerous unused variables, but that is where I would look. These are also subject to numerical instability in general, especially when your initial covariance is vastly different from the first computed covariance. It never hurts to compute P = 0.5*(P@P.T) to make it positive semi-definite again.

Hami1369 commented 3 years ago

Thank you very much rlabbe for your comment, I have edited the code for becoming into valid python.

Regarding the computing of 8 components, the nonlinear function just computes T_2, T_3 and other parameters do not change in the nonlinear function. the nonlinear functions are nonlinear continuous-time state equation where an additive process noise is assumed image image

After Transforming to discrete time I have:

image image which X = T_2 , T_3 p = R_21,R_32,C_2,C_3,w_1,w_2

the state function comes from image

as you can see just T_2 and T_3 have the nonlinear equations .

and the measurement function equlas to h = T_2 , T_3

Hami1369 commented 3 years ago

I have another approach for creating nonlinear functions, describing below :

pip install filterpy

import numpy as np
from filterpy.kalman import unscented_transform, MerweScaledSigmaPoints

#initialization 
mean_zero = np.array([ 4.459, 27.65 ,  6.27 , 23.653,  1.2  ,  0.21 ])
P_zero = np.diagflat((0.05*mean_zero)**2)
# initial parameters
T_2 =10
T_3 = 8
dt = 1
Q_1 = 15
Q_2 = 25
T_1 = 12

#sigma points parameters
alpha = 0.1
beta = 2
kappa = 0
n = 8

#create sigma points and weights
points = MerweScaledSigmaPoints(n=6, alpha=.1, beta=2., kappa=0)
sigmas = points.sigma_points(mean_zero, P_zero)

#sigma points weights
w_lambda = alpha**2 * (n + kappa) - n
Wc = np.full(2*n+1 , 0.5/(n + w_lambda))
Wm = np.full(2*n+1 , 0.5/(n + w_lambda))
Wc[0] = w_lambda / (n + w_lambda) + (1 - alpha**2 + beta)
Wm[0] = w_lambda / (n + w_lambda)

#process noise
Q = np.diagflat(((1*10**-5)*mean_zero)**2)
Q_n = np.random.multivariate_normal(np.zeros(6),Q,6)

#mesurement noise
R = (9*10**-2)*np.eye(2)
R_n = np.random.multivariate_normal(np.zeros(2),R,2)

def f_nonlinear_state(R_21,R_32,C_2,C_3,w_1,w_2):
    R_21 = R_21
    R_32 = R_32
    C_2 = C_2
    C_3 = C_3
    w_1 = w_1
    w_2 = w_2
    z = np.array([R_21 , R_32,C_2 ,C_3 ,w_1,w_2])
    return z

#passing sigma points through nonlinear_state_function
sigmas_f = np.empty((13, 6))
for i in range(13):
    sigmas_f[i] = f_nonlinear_state(sigmas[i, 0], sigmas[i ,1],sigmas[i, 2],sigmas[i, 3],sigmas[i, 4],sigmas[i, 5])

ukf_f_mean, ukf_f_cov = unscented_transform(sigmas_f, points.Wm, points.Wc)

#nonlinear mesurement function
def h_mesurement(T_2,T_3,R_21,R_32,C_2,C_3,w_1,w_2):
    T_2 = T_2 + dt*(-1*(T_2/C_2)*(1/R_21 + 1/R_32) + T_1/R_21*C_2 + T_3/R_32*C_2)
    T_3 = T_3+ dt*(-1*(T_3/C_3)*(1/R_32)+T_2/R_32*C_3 +w_1*Q_1/C_3 + w_2*Q_2/C_3)
    y = np.array([T_2,T_3])
    return y
#passing sigmas_f through mesurement function 
sigmas_h = np.empty((13, 2))
for i in range(13):
    sigmas_h[i] = h_mesurement(T_2,T_3,sigmas_f[i, 0], sigmas_f[i ,1],sigmas_f[i, 2],sigmas_f[i, 3],sigmas_f[i, 4],sigmas_f[i, 5])

ukf_h_mean, ukf_h_cov = unscented_transform(sigmas_h, points.Wm, points.Wc)

#cross covarinace
Pfh = np.zeros((6, 2))
for i in range(13):
    Pfh += Wc[i] * np.outer(sigmas_f[i] - ukf_f_mean,sigmas_h[i] - ukf_h_mean)
#Kalman gain
K = np.dot(Pfh, np.linalg.inv(ukf_h_cov)) 

#True value of the estimate 
h = np.array([47.39642954, 55.42371109]) 

#New mean and New covariance 

New_mean = ukf_f_mean + np.dot(K , h-ukf_h_mean ) 
New_covarince = ukf_f_cov - np.dot(K,ukf_h_cov).dot(K.T)
New_covarince

below are the assumption of nonlinear functions : image image U = T_1 , Q_1, Q_2 which are known input parameters

rlabbe commented 3 years ago

Hi, I don't understand the motivation for the second set of code. The KF is not estimating any parameters, hence K is meaningless, and you certainly can't apply it to parameters you are not estimating.

I've only used the UKF for small dimensional problems. This post suggests that the UKF can have non-positive semidefinite covariances when the dimension is > 3. https://math.stackexchange.com/questions/796331/scaling-factor-and-weights-in-unscented-transform-ukf

I'll reiterate my concern of you having 6 coefficients in the state vector which do not vary. Your covariance is negative when computing the covariance between these values. If they don't vary in the real world then they probably shouldn't be in the state vector; if they do try adding a model for how they vary. I don't think that will change the issue with high dimensionality; I'm just addressing the KF design.

Wan and van der Merwe's paper suggest evaluating the covariance over to ensure positive semi-definite results. I tried it but it didn't seem to work. It is possible I misunderstood what they meant and/or implemented it incorrectly.

I think if you make your state [T2, T3] and treat the rest as constants the UKF will work as you expect.