oseiskar / simdkalman

Python Kalman filters vectorized as Single Instruction, Multiple Data
https://simdkalman.readthedocs.io/
MIT License
177 stars 37 forks source link

Multi Dimensional Data Reshape Error #16

Closed aba312 closed 3 years ago

aba312 commented 3 years ago

Hi There,

I'm using the Kalman Filter to perform weight estimation for a Bayesian inference problem. I had been using PyKalman, however due to lack of maintenance, I'm looking for alternatives and I landed on simdkalman.

I am not sure where my problem lies, but I'm getting a reshape error when I call smooth:

Here is a dummy setup that represents my data structures:

import simdkalman
import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt

H = random.normal(size=(100,11)) # Observation model of size N_observations x M weights of interest
data = random.normal(size = (100,180)) # 100 Independent time series, each180 time samples long

kf = simdkalman.KalmanFilter(
    state_transition = np.eye(11),
    process_noise = 0.5*np.eye(11),
    observation_model = H,
    observation_noise = 0.5*np.eye(100)) 

smoothed = kf.smooth(data, 
      initial_value = np.zeros((11)), 
      initial_covariance=0.5*np.eye(11))

When I call kf.smooth I get: ValueError: cannot reshape array of size 100 into shape (100,100,1)

Here is the same code I used to call with pyKalman (again a representation with correct array sizes, not actual data):

from pykalman import KalmanFilter

H = random.normal(size=(100,11)) # Observation model of size N_observations x M weights of interest
data = random.normal(size = (100,180)) # 100 Independent time series, each180 time samples long

kf = KalmanFilter(observation_matrices = H,
                        observation_offsets = np.zeros((100)),
                        observation_covariance = 0.5*np.eye(100),
                        initial_state_mean = np.zeros((11)),
                        initial_state_covariance = 0.5*np.eye(11),
                        transition_matrices = np.eye(11),
                        transition_offsets = np.zeros(11),
                        transition_covariance = 0.5*np.eye(11))

(smoothedMeans, smoothedCov) = kf.smooth(data.T)

Any idea why my formulation and setup would cause this reshape error?

Thank you

oseiskar commented 3 years ago

Hi! The original design principle of Simdkalman is that you can run the same Kalman filter for several different time series at once, which is computationally very efficient, also in high-level Python. PyKalman or the other frameworks do not support this feature. However, in general use, this is a bit of a niche case and this makes some of the examples a bit confusing as you have to deal with 3 or 4 dimensional Numpy arrays, whose dimensions represent different stuff as they do in, e.g., PyKalman.

I would suggest first trying to get your model to work without the "N indpenedent time series" part and just operating on one time series at a time.

Another difference is that the normal "batch mode" of Simdkalman does not support different H matrix for each sample. If your H matrix is not constant, you need to use the primitives module.

It seems that you actually wan to use a H matrix of size 1x11 (which may be different for each sample), but now you are effectively feeding Simdkalman a constant matrix of size 100x11, which obviously does not fit your data.

aba312 commented 3 years ago

Thanks for getting back to me,

For the purposes of the above example I labeled my data as "independent" because for data formatting and the example I thought it could be presented that way.

In actuality, this 100 channel array consists of 100 channels of sensor data. Thus I want to treat it as a single 100 channel data array that evolves in time. The specific dimension isn't important, in general my data is NxT and my H matrix is NxW, N data channels, T Time points, W weights (to solve for).

The H matrix does not vary with time index, or initial conditions.

When you say:

Another difference is that the normal "batch mode" of Simdkalman does not support different H matrix for each sample

Does H always need to be of size 1xW ?

In the full scope of my problem, I do solve this 100 channel array over and over again with different initial conditions and different H matrices. The worst case I solve with PyKalman, the problem has 60 different H matrices of varying W, and 200 different initial conditions. You can imagine why I'm looking for a faster library.

Does simdkalman support computing the Kalman Smoother for a multi-dimensional time series in which all samples were derived from the same underlying distribution?

If so with this multi-dimensional time series can I batch solve across different initial conditions?

Using the numbers from above this would be like having: State Covariance ~ MxWxW (M are the 200 different initial conditions, W the number of weights or states I'm computing)

Thank you again for helping to clarify what I can and can't do here.

oseiskar commented 3 years ago

Thanks for clarifying, I had misunderstood what you were trying to achieve. Multi-dimensional time series are supported so H does not have to be 1xW, it can be NxW too. Here's an example code with such data https://github.com/oseiskar/simdkalman/blob/master/examples/multi_dimensional_observations.py. You basically have to add some dimensions to the data array.

It is possible to batch over different initial conditions and different H matrices. I had to check the H matrix part as I hadn't tried that before, but it seems to work almost out-of-the-box. Here's a full example (based on the above code), where the W=N=2, T=200


"""
Multi-dimensional observations example, different H for each time series
"""

import simdkalman
import numpy as np
import numpy.random as random

n_independent = 100

# Form H and initial cov that differ for each time series
obs_models = []
initial_means = []
initial_covariances = []
for i in range(n_independent):
    # just something that depends on i to see the difference
    obs_model = np.array([[i+0.001, 0], [0, 1]])

    cov = np.eye(2) * (10**2)
    mean = np.array([[0], [0]])

    obs_models.append(obs_model[np.newaxis, ...])
    initial_covariances.append(cov[np.newaxis, ...])
    initial_means.append(mean[np.newaxis, ...])

H = np.vstack(obs_models)
cov0 = np.vstack(initial_covariances)
# this is optional but cov0 is not, due to implementation details
mean0 = np.vstack(initial_means)

# In this model, there is a hidden trend and two independent noisy observations
# are made at each step
kf = simdkalman.KalmanFilter(
    state_transition = np.array([[1,1],[0,1]]),
    process_noise = np.diag([0.2, 0.01])**2,
    observation_model = H,
    observation_noise = np.eye(2)*3**2)

# simulate n_independent time series of 200 two-dimensional observations
rand = lambda: random.normal(size=(n_independent, 200))
data = np.cumsum(np.cumsum(rand()*0.02, axis=1) + rand(), axis=1)
data = np.dstack((data + rand()*3, data + rand()*3))

# smooth and explain existing data
smoothed = kf.smooth(data, initial_value=mean0, initial_covariance=cov0)
# predict new data
pred = kf.predict(data, 15, initial_value=mean0, initial_covariance=cov0)

import matplotlib.pyplot as plt

# show the first smoothed time series
i = 0

n_obs = data.shape[2]
_, axes = plt.subplots(n_obs+1, 1, sharex=True)

x = np.arange(0, data.shape[1])
x_pred = np.arange(data.shape[1], data.shape[1]+pred.observations.mean.shape[1])

for j in range(n_obs):
    ax = axes[j]

    ax.plot(x, data[i,:,j], 'b.', label="observation component %d" % j)
    smoothed_obs = smoothed.observations.mean[i,:,j]

    obs_stdev = np.sqrt(smoothed.observations.cov[i,:,j,j])
    ax.plot(x, smoothed_obs, 'r-', label="smoothed")
    ax.plot(x, smoothed_obs - obs_stdev, 'k--', label="67% confidence")
    ax.plot(x, smoothed_obs + obs_stdev, 'k--')

    y_pred = pred.observations.mean[i,:,j]
    pred_stdev = np.sqrt(pred.observations.cov[i,:,j,j])

    ax.plot(x_pred, y_pred, 'b-', label="predicted")
    ax.plot(x_pred, y_pred + pred_stdev, 'k--')
    ax.plot(x_pred, y_pred - pred_stdev, 'k--')
    ax.legend()

ax = axes[-1]

trend = smoothed.states.mean[i,:,1]
trend_stdev = np.sqrt(smoothed.states.cov[i,:,1,1])
ax.plot(x, trend, 'g-', label="hidden trend")
ax.plot(x, trend - trend_stdev, 'k--', label="67% confidence")
ax.plot(x, trend + trend_stdev, 'k--')

trend_pred = pred.states.mean[i,:,1]
trend_pred_stdev = np.sqrt(pred.states.cov[i,:,1,1])
ax.plot(x_pred, trend_pred, 'b-', label='predicted')
ax.plot(x_pred, trend_pred + trend_pred_stdev, 'k--')
ax.plot(x_pred, trend_pred - trend_pred_stdev, 'k--')
ax.legend()

plt.show()

Unfortunately, you can't directly batch over varying hidden state dimensions W. You could, in theory, use the maximum of all W you have in the batch and then form the H (and A) matrices so that the extra dimensions are essentially unused. However, this is numerically and performance-wise a bit sketchy and you might be better off with the unbatched version.

It sounds a bit strange to me that the hidden dimension would vary arbitrarily. It would be interesting to know what your A matrices look like. Is np.eye(11) just a placeholder in the example or do you actually use it? (because in the latter case, it's a sign that the Kalman filter formulation is not the best choice for the problem or not completely thought-out).

aba312 commented 3 years ago

Thank you for this! I will take a look and get back to you on my success or challenges here.

My problem is a little different from what one might normally use a Kalman Smoother for and without going into too much detail:

I'm using the Kalman Smoother to solve a Maximum a Posteriori (MAP) estimation problem which is ill-conditioned, so it's not that the hidden dimension is arbitrary, more that I'm solving the problem over and over again across a variety of hidden dimensions (and initial conditions).

I'm leveraging the formulation of the Kalman Filter as an estimator which promotes Time Smoothness (the solution at time T is based on the previous solutions).

To this end, A is the identity matrix of dimension (num hidden weights, e.g. 11) and H encodes my model (it is not actually random, that was a placeholder for the dimensions of H)

If you're interested I'd be happy to send you an e-mail to a paper which describes part of what I'm doing, it might help with context - however I wasn't varying the problem over the hidden states when it was written.

Thank you again

oseiskar commented 3 years ago

Sounds interesting! Please do send the paper. You can find my email in my profile (I think). I actually also have background in ill-conditioned/inverse problems but haven't used KFs as a regularization method.