porting from pykalman to simdkalman

rspadim commented 4 years ago

Hello!

I want to understand how to port a pykalman code to simdkalman, could anyone help me?

it's a implementation of kalman to understand the kinetic (position+velocity+acceleration) movement (KCA) from lopez prado (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2422183)

# by MLdP on 02/22/2014 <lopezdeprado@lbl.gov>
# Kinetic Component Analysis
import numpy as np
from pykalman import KalmanFilter
#-------------------------------------------------------------------------------
def fitKCA(t,z,q,fwd=0):    
    '''
    Inputs:
        t: Iterable with time indices
        z: Iterable with measurements
        q: Scalar that multiplies the seed states covariance
        fwd: number of steps to forecast (optional, default=0)
    Output:
        x[0]: smoothed state means of position velocity and acceleration
        x[1]: smoothed state covar of position velocity and acceleration
    Dependencies: numpy, pykalman
    '''

    #1) Set up matrices A,H and a seed for Q
    h=(t[-1]-t[0])/t.shape[0]    
    # my doubt... should the H be = [[h,0]] ? considering https://github.com/oseiskar/simdkalman/issues/11

    A=np.array([[1,h,.5*h**2],     # space
                [0,1,h],           # velocity
                [0,0,1]])          # aceleration
    Q=q*np.eye(A.shape[0])

    #2) Apply the filter    
    kf=KalmanFilter(transition_matrices=A,transition_covariance=Q)

    #3) EM estimates
    kf=kf.em(z)
    # could this be done by smidkalman?

    #4) Smooth
    x_mean,x_covar=kf.smooth(z)
    # what about this one?

    #5) Forecast
    for fwd_ in range(fwd):
        x_mean_,x_covar_=kf.filter_update(filtered_state_mean=x_mean[-1], \
            filtered_state_covariance=x_covar[-1])
        x_mean=np.append(x_mean,x_mean_.reshape(1,-1),axis=0)
        x_covar_=np.expand_dims(x_covar_,axis=0)
        x_covar=np.append(x_covar,x_covar_,axis=0)

    #6) Std series
    x_std=(x_covar[:,0,0]**.5).reshape(-1,1)
    for i in range(1,x_covar.shape[1]):
        x_std_=x_covar[:,i,i]**.5
        x_std=np.append(x_std,x_std_.reshape(-1,1),axis=1)
    return x_mean,x_std,x_covar

oseiskar commented 4 years ago

Hi! Thank you for your interest in Simdkalman

At what point did you get stuck when you tried to migrate this code? The example code seems to contain all elements used in the above snippet

Also what is your primary motivation of porting this code to Simdkalman? Is it too slow or would you like to analyse multiple similar-sized time series in a vectorized manner?

As a clarification to my comment in the PyKalman issue tracker https://github.com/pykalman/pykalman/issues/86: I do not think it is necessary for the users of PyKalman to change their Kalman filter library if PyKalman is working fine. However, I do think that library developers should avoid wasting their time submitting PRs or issues to PyKalman as the maintainer is not reacting to those anymore.

However, I agree that a short migration guide would be a good addition to the READMEs

rspadim commented 4 years ago

Hi @oseiskar ! the code isn't so fast (pykalman), but "fast" without benchmark isn't a good metric (that's the motivation)

i get stuck at em() function, i didn't started the implementation yet

thanks about the example code, i will try to use it now

oseiskar commented 4 years ago

Sorry, I missed the questions in your original message (at a quick glance it seemed that the code snippet was just copied as-is from the paper, which I wasn't). The were not trivial either, let me try to answer:

my doubt... should the H be = [[h,0]] ? considering https://github.com/oseiskar/simdkalman/issues/11

The correct value for H is seems to be np.eye(3, 1), which is [[1,0,0]], this is a hidden and undocumented feature in PyKalman (and not an obvious default in my opinion). It's always better to define the observation matrix explicitly, if not for other purposes, then readability / documentation.

kf=kf.em(z) # could this be done by smidkalman?

Yes, Simdkalman actually has exactly the same API (not documented currently, unfortunately): https://github.com/oseiskar/simdkalman/blob/67e538a5981aa4a2dc323028167913bf5f676383/examples/example.py#L27

which does the same thing, that is, apply the EM algorithm on these variables: https://github.com/pykalman/pykalman/blob/8d3f8e498b64d902016a0216bf2bcc8b262d917b/pykalman/standard.py#L1495-L1498 (these defaults are also undocumented in PyKalman)

x_mean,x_covar=kf.smooth(z) # what about this one?

Yes, this is also done in the example https://github.com/oseiskar/simdkalman/blob/67e538a5981aa4a2dc323028167913bf5f676383/examples/example.py#L30 . You can access the mean as smoothed.mean and the covariance as smoothed.cov.

The forecasting feature (step 5 in your code) can be done using the "predict" method: https://github.com/oseiskar/simdkalman/blob/67e538a5981aa4a2dc323028167913bf5f676383/examples/example.py#L32

rspadim commented 4 years ago

looks promissing, from 33 seconds to 11 seconds (3x)

i'm not sure yet about covar output and multi variable output (from pykalman there's a space/velocity/aceleration output variables , from simdkalman i'm not sure if we have it)

first translation (not tested covar/std/predict yet)

# by MLdP on 02/22/2014 <lopezdeprado@lbl.gov>
# Kinetic Component Analysis
import numpy as np
from pykalman import KalmanFilter
import simdkalman

#-------------------------------------------------------------------------------
def fitKCA(t,z,q,fwd=0):    
    '''
    Inputs:
        t: Iterable with time indices
        z: Iterable with measurements
        q: Scalar that multiplies the seed states covariance
        fwd: number of steps to forecast (optional, default=0)
    Output:
        x[0]: smoothed state means of position velocity and acceleration
        x[1]: smoothed state covar of position velocity and acceleration
    Dependencies: numpy, pykalman
    '''
    #1) Set up matrices A,H and a seed for Q
    h=(t[-1]-t[0])/t.shape[0]
    A=np.array([[1,h,.5*h**2],
                [0,1,h],
                [0,0,1]])
    Q=q*np.eye(A.shape[0])
    #2) Apply the filter    
    kf=KalmanFilter(transition_matrices=A,transition_covariance=Q)
    #3) EM estimates
    kf=kf.em(z)
    #4) Smooth
    x_mean,x_covar=kf.smooth(z)
    #5) Forecast
    for fwd_ in range(fwd):
        x_mean_,x_covar_=kf.filter_update(filtered_state_mean=x_mean[-1], \
            filtered_state_covariance=x_covar[-1])
        x_mean=np.append(x_mean,x_mean_.reshape(1,-1),axis=0)
        x_covar_=np.expand_dims(x_covar_,axis=0)
        x_covar=np.append(x_covar,x_covar_,axis=0)
    #6) Std series
    x_std=(x_covar[:,0,0]**.5).reshape(-1,1)
    for i in range(1,x_covar.shape[1]):
        x_std_=x_covar[:,i,i]**.5
        x_std=np.append(x_std,x_std_.reshape(-1,1),axis=1)
    print(x_std)
    return x_mean,x_std,x_covar

# TODO: change to simdkalman
#       https://github.com/oseiskar/simdkalman/blob/master/examples/example.py
def fitKCA_simdkalman(t,z,q,fwd=0):    
    '''
    Inputs:
        t: Iterable with time indices
        z: Iterable with measurements
        q: Scalar that multiplies the seed states covariance
        fwd: number of steps to forecast (optional, default=0)
    Output:
        x[0]: smoothed state means of position velocity and acceleration
        x[1]: smoothed state covar of position velocity and acceleration
    Dependencies: numpy, pykalman
    '''
    #1) Set up matrices A,H and a seed for Q
    h=(t[-1]-t[0])/t.shape[0]
    A=np.array([[1,h,.5*h**2],
                [0,1,h],
                [0,0,1]])
    Q=q*np.eye(A.shape[0])
    #2) Apply the filter    
    #kf=KalmanFilter(transition_matrices=A,transition_covariance=Q)
    kf = simdkalman.KalmanFilter(
        state_transition = A,        # <--- this is the matrix A
        process_noise = Q,           # Q
        observation_model = np.array([[1,0,0]]),   # H
        observation_noise = q
    )
    #3) EM estimates
    kf=kf.em(z)
    #4) Smooth
    smoothed = kf.smooth(z)
    x_mean, x_covar = smoothed.observations.mean, smoothed.observations.cov
    x_std = None
    #5) Forecast
    if False:
        pred = kf.predict(data, fwd)
        print(pred)
        x_mean = np.append(x_mean, pred.mean)
        x_covar = np.append(x_covar, pred.cov)
        #for fwd_ in range(fwd):
        #    x_mean_,x_covar_=kf.filter_update(filtered_state_mean=x_mean[-1], \
        #        filtered_state_covariance=x_covar[-1])
        #    x_mean=np.append(x_mean,x_mean_.reshape(1,-1),axis=0)
        #    x_covar_=np.expand_dims(x_covar_,axis=0)
        #    x_covar=np.append(x_covar,x_covar_,axis=0)
    #6) Std series
    if False:
        x_std=(x_covar**.5).reshape(-1,1)
        #for i in range(1,x_covar.shape[1]):
        #    x_std_=x_covar[:,i,i]**.5
        #    x_std=np.append(x_std,x_std_.reshape(-1,1),axis=1)
        print(x_std)
    return x_mean,x_std,x_covar

rspadim commented 4 years ago

from pykalman i got 3 outputs , but with simd it only output one variable (space)

plt.plot(ret_pykalman[0][500:][:,0]) # pykalman - check [:,0] return space, [:,1] return velocity, [:,2] return aceleration plt.show() plt.plot(ret_simdkalman[0][500:]) # simdkalman, only output space plt.show()

oseiskar commented 4 years ago

The hidden state variables are available too: Just access the statesfield instead of observations, e.g., smoothed.observations.mean -> smoothed.states.mean (see also the docs)

rspadim commented 4 years ago

Uhm I’m a bit confused

Using pykalman the system is a cinetic equation (space = velocity time + .5 aceleration ** 2) that return 3 measures (space, velocity, aceleration) I can acess it with [:,0] [:,1] [:,2] with pykalman, but with simdkalman i’m trying to understand how to do something like it

Any help is wellcome, the space variable is working

If i understood it right, pykalman execute the system and create multi timeseries output, simdkalman have other output (maybe i should input velocity and aceleration as time srries?)

oseiskar commented 4 years ago

See my previous message: Simdkalman outputs both:

The 3-dimensional hidden states (x) as smoothed.states.mean
the one-dimensional observations: y = H x as smoothed.observations.mean

You now use smoothed.observations. Just change to smoothed.states if you want the 3-dimensional hidden states. The covariances for both things are also available (e.g. smoothed.observations.cov vs smoothed.states.cov)

rspadim commented 4 years ago

Ok, i think it’s just part of “porting pykalman behaviour”

I see there’s 1 observation, at pykalman there’s 3 observations Maybe a good point to explain is how to get back to states and observations and recreate these “others observations”

oseiskar commented 4 years ago

Technically, PyKalman always outputs the hidden states (x) in smoothing. The smoothed observations (produced from the states by multiplying with the observation matrix: y = Hx) are a convenience feature in Simdkalman only. If you just always use smoothed.states.mean and smoothed.states.cov (and forget about the smoothed observations), the behavior is identical to PyKalman.

Observations and (hidden) states are two very different objects in Kalman filters and it's always good to be explicit about which ones you deal with ... and always explicitly define the matrix H, which transforms hidden states to observations.

rspadim commented 4 years ago

Nice @oseiskar !

I think this's something that a programmer (not a kalman user) will question about, maybe an example is something interesting at documentation

Just to report what's the difference about pykalman and simdkalman outputs, and what should do to have the same behavior

oseiskar commented 4 years ago

The main issue seems to be resolved

oseiskar / simdkalman

porting from pykalman to simdkalman #12